Data center failure management in an sdn deployment using switching node control

ABSTRACT

A data center failure management system and method in a Software Defined Networking (SDN) deployment. In one embodiment, an SDN controller associated with the data center is configured to learn new flows entering the data center and determine which flows require flow stickiness. Responsive to the determination, the SDN controller generates commands to one or more switching nodes and/or one or more border gateway nodes to redirect the sticky flows arriving at the switching nodes via ECMP routes from the gateway nodes or avoid the ECMP routes by the gateway nodes in order to overcome certain failure conditions encountered in the data center, an external network, or both.

TECHNICAL FIELD

The present disclosure generally relates to communications networks.More particularly, and not by way of any limitation, the presentdisclosure is directed to a system and method for managing data centerfailures in a Software Defined Networking (SDN) deployment usingswitching node control.

BACKGROUND

Demand for dynamic scaling and benefits from economies of scale aredriving the creation of mega data centers to host a broad range ofservices such as Web search, e-commerce, storage backup, videostreaming, high-performance computing, and data analytics, to name afew. To host these applications, data center networks need to bescalable, efficient, fault tolerant, and easy-to-manage. Recognizingthis need, the research community has proposed several architectures toimprove scalability, reliability and performance of data centernetworks, e.g., deployment of redundant network nodes, load balancing,using Equal Cost Multi Path (ECMP) routing and failover, etc.

Technologies such as Software Defined Networking (SDN) and NetworkFunction Virtualization (NFV) are transforming traditional networks intosoftware programmable domains running on simplified, lower costhardware, driving the convergence of IT and telecom markets. Thisconvergence is expected to overhaul network operations, enable newservices and business models, and impact existing data center solutions.

Whereas advances in technologies such as SDN, NFV, and cloud-basedservice hosting continue to grow apace, several lacunae remain in thefield of data center failure management, thereby requiring furtherinnovation as will be set forth hereinbelow.

SUMMARY

The present patent disclosure is broadly directed to systems, methods,apparatuses, devices, and associated non-transitory computer-readablemedia and network architecture for effectuating a failure managementsystem and method operative in a data center with SDN architecture. Inone embodiment, an SDN controller associated with the data center isconfigured to learn new flows entering the data center and determinewhich flows require flow stickiness. Responsive to the determination,the SDN controller generates commands to one or more switching nodesand/or one or more border gateway nodes of the data center to redirectthe sticky flows arriving at the switching nodes via ECMP routes fromthe gateway nodes or avoid the ECMP routes by the gateway nodes in orderto overcome certain failure conditions encountered in the data center,an external network coupled to the data center, or both.

In one aspect, an embodiment of a method operating at an SDN controllerassociated with a data center having a plurality of switching nodescoupled in a data center network fabric to a plurality of gateway nodesis disclosed. The claimed method comprises, inter alia, receiving afirst packet of a packet flow from a first switching node coupled to aservice instance hosted by the data center, the packet flow having withan n-tuple identifier set and flowing from a gateway node pursuant to asubscriber session via an external network. Responsive to determining orotherwise obtaining information that the packet flow requires flowstickiness with respect to the subscriber session, the SDN controller isconfigured to compute bypass routing paths for the packet flow andadvertise the routing paths to the gateway nodes using Border GatewayProtocol (BGP) Flow Specification (BGP-FS), wherein the bypass routingpaths identify the first switching node having flow stickiness as therouting destination.

In one variation, the claimed method further comprises generatinginstructions to the plurality of switching nodes to forward firstpackets of new packet flows respectively received thereat to the SDNcontroller, the first packets each containing respective n-tupleidentifier sets. In another variation, the bypass routing paths for thepacket flow advertised to the gateways nodes using BGP-FS are accorded ahigher priority than ECMP routing paths with load balancing.

In another aspect, an embodiment of an SDN controller associated with adata center having a plurality of switching nodes coupled in a datacenter network fabric to a plurality of gateway nodes is disclosed. Theclaimed SDN controller comprises, inter alia, one or more processors,and a persistent memory module coupled to the one or more processors andhaving program instructions thereon, which when executed by the one ormore processors perform the following: generate instructions to theplurality of switching nodes to forward first packets of new packetflows respectively received thereat to the SDN controller, the firstpackets each containing respective n-tuple identifier sets; responsiveto the instructions, receive a first packet of a packet flow from afirst switching node coupled to a service instance hosted by the datacenter, the packet flow having with an n-tuple identifier set andflowing from a gateway node pursuant to a subscriber session via anexternal network; determine or obtain information that the packet flowrequires flow stickiness with respect to the subscriber session; andresponsive thereto, computing routing paths for the packet flow andadvertising the routing paths to the gateway nodes using BGP-FS, therouting paths having a switching node destination pointing to the firstswitching node having flow stickiness.

In another aspect, an embodiment of a method operating at a gateway nodeof a data center having a plurality of switching nodes coupled in a datacenter network fabric to a plurality of gateway nodes and controlled byan SDN controller is disclosed. The claimed method comprises, interalia, receiving routing paths from the SDN controller advertised usingBGP-FS for a plurality of packet flows each having an n-tuple identifierset associated therewith, wherein each routing path identifies aswitching node as a destination for a corresponding packet flow. Amatching Forward Information Base (FIB) (also referred to as FS-FIB) ispopulated or installed with the routing paths advertised via BGP-FS fromthe SDN controller. When packets arrive at the gateway node for aparticular packet flow pursuant to a subscriber session via an externalnetwork, the particular packet flow having a particular n-tupleidentifier set, a determination is made as to whether the particularn-tuple identifier set of the particular packet flow matches an entry inthe matching FIB. If so, the packets of the particular packet flow areforwarded to a corresponding switching node identified in the routingpath associated with the particular n-tuple identifier set instead ofrouting the particular packet flow according to an ECMP pathcomputation. In one variation, the claimed gateway method furtherincludes, responsive to determining that the particular n-tupleidentifier set of the particular packet flow does not have an entry inthe matching FIB, forwarding the packets of the particular packet flowto a Longest Prefix Match (LPM) Forwarding Base (FIB) for determining anECMP routing path to a switching node of the data center.

In a related aspect, an embodiment of a gateway node associated with adata center having a plurality of switching nodes coupled in a datacenter network fabric and controlled by an SDN controller is disclosed.The gateway node comprises, inter alia, one or more processors, and apersistent memory module coupled to the one or more processors andhaving program instructions thereon, which perform the following whenexecuted by the one or more processors: populating a matching forwardinformation base (FIB) with routing paths from the SDN controlleradvertised using BGP-FS for a plurality of packet flows each having ann-tuple identifier set associated therewith, wherein each routing pathidentifies a switching node as a destination for a corresponding packetflow; receiving packets at the gateway node for a particular packet flowpursuant to a subscriber session via an external network, the particularpacket flow having a particular n-tuple identifier set; determine if theparticular n-tuple identifier set of the particular packet flow matchesan entry in the matching FIB; and if so, forward the packets of theparticular packet flow to a corresponding switching node identified inthe routing path associated with the particular n-tuple identifier setinstead of routing the particular packet flow pursuant to an ECMP path.In one variation, the gateway node further includes program instructionsconfigured, responsive to determining that the particular n-tupleidentifier set of the particular packet flow does not have an entry inthe matching FIB, to forward the packets of the particular packet flowto an LPM FIB for determining an ECMP routing path to a switching nodeof the data center. In another variation, the gateway node furthercomprises program instructions for deleting the entry corresponding tothe particular n-tuple identifier set of the particular packet flow fromthe matching FIB responsive to at least one of a command from the SDNcontroller, reaching a timeout value, and determining that thesubscriber session has been terminated.

In a still further aspect, an embodiment of a method operating at an SDNcontroller associated with a data center having a plurality of switchingnodes coupled in a data center network fabric to a plurality of gatewaynodes is disclosed. The claimed method comprises, inter alia, receivinga first packet of a packet flow from a first switching node coupled to aservice instance hosted by the data center, the packet flow having withan n-tuple identifier set and flowing from a gateway node pursuant to asubscriber session via an external network and determining or obtaininginformation that the packet flow requires flow stickiness with respectto the subscriber session. Responsive thereto, the method operates byprogramming each switching node (e.g., preferably other than the firstswitching node) to redirect any packets of the packet flow received bythe switching nodes to the first switching node, the packets arriving atthe switching nodes from at least a subset of the plurality of gatewaynodes according to respective ECMP routing paths. In one variation, themethod further includes programming each switching node by the SDNcontroller to redirect the packets of the packet flow by sending one ormore OpenFlow specification match-action rules to the switching nodesthat identify the first switching node as a destination node for thatn-tuple flow. In another variation, the method further includesgenerating instructions to the plurality of switching nodes to forwardfirst packets of new packet flows respectively received thereat to theSDN controller, the first packets each containing respective n-tupleidentifier sets.

In another aspect, an embodiment of an SDN controller associated with adata center having a plurality of switching nodes coupled in a datacenter network fabric to a plurality of gateway nodes is disclosed. Theclaimed SDN controller comprises, inter alia, one or more processors,and a persistent memory module coupled to the one or more processors andhaving program instructions thereon, which when executed by the one ormore processors perform the following: generate instructions to theplurality of switching nodes to forward first packets of new packetflows respectively received thereat to the SDN controller, the firstpackets each containing respective n-tuple identifier sets; responsiveto the instructions, receive a first packet of a packet flow from afirst switching node coupled to a service instance hosted by the datacenter, the packet flow having with an n-tuple identifier set andflowing from a gateway node pursuant to a subscriber session via anexternal network; determine that the packet flow requires flowstickiness with respect to the subscriber session; and responsive to thedetermining, program each switching node (e.g., preferably other thanthe first switching node) to redirect any packets of the packet flowreceived by the switching nodes to the first switching node, the packetsarriving at the switching nodes from at least a subset of the pluralityof gateway nodes according to respective ECMP routing paths. In arelated variation, the SDN controller further includes programinstructions configured to program the switching nodes by sending one ormore OpenFlow specification match-action rules that identify the firstswitching node as a destination node.

In a still further aspect, an embodiment of a method operating at aswitching node of a data center having a plurality of switching nodescoupled in a data center network fabric to a plurality of gateway nodesand controlled by an SDN controller is disclosed. The claimed methodcomprises, inter alia, forwarding first packets of new packet flowsreceived at the switching node to the SDN controller, the first packetseach containing respective n-tuple identifier sets of correspondingpacket flows and flowing from one or more gateway nodes pursuant torespective subscriber sessions via an external network; populating aflow identification database for identifying the new packet flows asthey arrive at the switching node; receiving packets at the switchingnode for a particular packet flow pursuant to a subscriber session viaan external network, the particular packet flow having a particularn-tuple identifier set; determining that the particular packet flow isnot a new packet flow by comparing the particular n-tuple identifier setof the particular packet flow against the flow identification database;responsive to the determining, applying a programming rule provided bythe SDN controller with respect to the particular packet flow, theprogramming rule operating to identify a destination switching nodeother than the switching node receiving the particular packet flow; andredirecting the packets of the particular packet flow to the destinationswitching node identified according to the programming rule. In onevariation, the programming rule comprises at least one match-action ruleprovided by the SDN controller according to OpenFlow specification.

In a related aspect, an embodiment of a switching node associated with adata center having a plurality of gateway nodes and a plurality ofswitching nodes coupled in a data center network fabric and controlledby an SDN is disclosed. The claimed switching node comprises, interalia, one or more processors, and a persistent memory module coupled tothe one or more processors and having program instructions thereon,which perform the following when executed by the one or more processors:forward first packets of new packet flows received at the switching nodeto the SDN controller, the first packets each containing respectiven-tuple identifier sets of corresponding packet flows and flowing fromone or more gateway nodes pursuant to respective subscriber sessions viaan external network; populate a flow identification database foridentifying the new packet flows as they arrive at the switching node;receive packets at the switching node for a particular packet flowpursuant to a subscriber session via an external network, the particularpacket flow having a particular n-tuple identifier set; determine thatthe particular packet flow is not a new packet flow by comparing theparticular n-tuple identifier set of the particular packet flow againstthe flow identification database; responsive to the determining, apply aprogramming rule provided by the SDN controller with respect to theparticular packet flow, the programming rule operating to identify adestination switching node other than the switching node receiving theparticular packet flow; and redirect the packets of the particularpacket flow to the destination switching node identified according tothe programming rule.

In still further aspects, an embodiment of a system, apparatus, ornetwork element is disclosed which comprises, inter alia, suitablehardware such as processors and persistent memory having programinstructions for executing an embodiment of any of the methods set forthherein.

In still further aspects, one or more embodiments of a non-transitorycomputer-readable medium or distributed media containingcomputer-executable program instructions or code portions stored thereonare disclosed for performing one or more embodiments of the methods ofthe present invention when executed by a processor entity of a networknode, apparatus, system, network element, and the like, mutatismutandis. Further features of the various embodiments are as claimed inthe dependent claims.

Example embodiments set forth herein advantageously provide improvedfailure management in scalable data center environments. Sincestandards-based solutions are effectuated using, e.g., BGP and OpenFLowspecifications, an embodiment of the present invention may be readilydeployed without requiring expensive hardware/software mirroring whileaccommodating current ECMP based implementations. Further, exampleembodiments also avoid complex state synchronization required in some ofthe existing data center technologies. Additional benefits andadvantages of the embodiments will be apparent in view of the followingdescription and accompanying Figures.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present disclosure are illustrated by way of example,and not by way of limitation, in the Figures of the accompanyingdrawings in which like references indicate similar elements. It shouldbe noted that different references to “an” or “one” embodiment in thisdisclosure are not necessarily to the same embodiment, and suchreferences may mean at least one. Further, when a particular feature,structure, or characteristic is described in connection with anembodiment, it is submitted that it is within the knowledge of oneskilled in the art to effect such feature, structure, or characteristicin connection with other embodiments whether or not explicitlydescribed.

The accompanying drawings are incorporated into and form a part of thespecification to illustrate one or more exemplary embodiments of thepresent disclosure. Various advantages and features of the disclosurewill be understood from the following Detailed Description taken inconnection with the appended claims and with reference to the attacheddrawing Figures in which:

FIG. 1 depicts a generalized data center network environment wherein oneor more embodiments of the present invention may be practiced;

FIG. 2 depicts an example data center with SDN control deployment to anembodiment of the present invention;

FIG. 3 depicts an example load balancing scenario in the data center ofFIG. 2;

FIGS. 4-6 depict example failure scenarios in the data center of FIG. 2that may be overcome according to one or more embodiments of the presentinvention;

FIG. 7 is a flowchart illustrative of various blocks, steps and/or actsof a method operating at a data center that may be (re)combined in oneor more arrangements, with or without blocks, steps and/or acts ofadditional flowcharts of the present disclosure;

FIGS. 8A-8C are flowcharts illustrative of various blocks, steps and/oracts that may be (re)combined in one or more arrangements, with orwithout blocks, steps and/or acts of additional flowcharts of thepresent disclosure;

FIG. 9 is flowchart illustrative of various blocks, steps and/or actsthat may be (re)combined in one or more arrangements, with or withoutblocks, steps and/or acts of additional flowcharts of the presentdisclosure;

FIGS. 10A and 10B depict example pipeline structures and associatedcomponents of a switching node and a gateway node, respectively,according to an embodiment of the present invention;

FIGS. 11A and 11B are flowcharts illustrative of various blocks, stepsand/or acts that may be (re)combined in one or more arrangements, withor without blocks, steps and/or acts of additional flowcharts of thepresent disclosure;

FIG. 12 depicts an example packet structure and associated match-actionrule(s) illustrative for purposes of an embodiment of the presentinvention;

FIGS. 13-15 depict example solution scenarios in the data center of FIG.2 according to one or more embodiments of the present invention;

FIG. 16 depicts a network function virtualization (NFV) architecturethat may be implemented in conjunction with a data center deployment ofthe present invention;

FIGS. 17A/17B illustrate connectivity between network devices (NDs) ofan exemplary data center and/or associated network environment, as wellas three exemplary implementations of the NDs, according to someembodiments of the present invention; and

FIG. 18 depicts a block diagram of a computer-implemented platform orapparatus that may be (re)configured and/or (re)arranged as a datacenter switching node, gateway node and/or associated SDN controlleraccording to an embodiment of the present invention.

DETAILED DESCRIPTION

In the description herein for embodiments of the present invention,numerous specific details are provided, such as examples of componentsand/or methods, to provide a thorough understanding of embodiments ofthe present invention. One skilled in the relevant art will recognize,however, that an embodiment of the invention can be practiced withoutone or more of the specific details, or with other apparatus, systems,assemblies, methods, components, materials, parts, and/or the like. Inother instances, well-known structures, materials, or operations are notspecifically shown or described in detail to avoid obscuring aspects ofembodiments of the present invention. Accordingly, it will beappreciated by one skilled in the art that the embodiments of thepresent disclosure may be practiced without such specific components. Itshould be further recognized that those of ordinary skill in the art,with the aid of the Detailed Description set forth herein and takingreference to the accompanying drawings, will be able to make and use oneor more embodiments without undue experimentation.

Additionally, terms such as “coupled” and “connected,” along with theirderivatives, may be used in the following description, claims, or both.It should be understood that these terms are not necessarily intended assynonyms for each other. “Coupled” may be used to indicate that two ormore elements, which may or may not be in direct physical or electricalcontact with each other, co-operate or interact with each other.“Connected” may be used to indicate the establishment of communication,i.e., a communicative relationship, between two or more elements thatare coupled with each other. Further, in one or more example embodimentsset forth herein, generally speaking, an element, component or modulemay be configured to perform a function if the element may be programmedfor performing or otherwise structurally arranged to perform thatfunction.

As used herein, a network element (e.g., a router, switch, bridge, etc.)is a piece of networking equipment, including hardware and software thatcommunicatively interconnects other equipment on a network (e.g., othernetwork elements, end stations, etc.). Some network elements maycomprise “multiple services network elements” that provide support formultiple networking functions (e.g., routing, bridging, switching,Layer-2 aggregation, session border control, Quality of Service, and/orsubscriber management, and the like), and/or provide support formultiple application services (e.g., data, voice, and video).Subscriber/tenant end stations (e.g., servers, workstations, laptops,netbooks, palm tops, mobile phones, smartphones, tablets, phablets,multimedia phones, Voice Over Internet Protocol (VoIP) phones, userequipment, terminals, portable media players, GPS units, gaming systems,set-top boxes, etc.) may access or consume resources/services, includingcloud-centric resources/services, provided over a multi-domain,multi-operator heterogeneous network environment, including, e.g., apacket-switched wide area public network such as the Internet viasuitable service provider access networks, wherein a data center may bemanaged according to one or more embodiments set forth hereinbelow.Subscriber/tenant end stations may also access or consumeresources/services provided on virtual private networks (VPNs) overlaidon (e.g., tunneled through) the Internet. Typically, subscriber/tenantend stations may be coupled (e.g., through customer/tenant premisesequipment or CPE/TPE coupled to an access network (wired or wirelessly))to edge network elements, which are coupled (e.g., through one or morecore network elements) to other edge network elements, and tocloud-based data center elements with respect to consuming hostedresources/services according to service management agreements,contracts, etc.

One or more embodiments of the present patent disclosure may beimplemented using different combinations of software, firmware, and/orhardware. Thus, one or more of the techniques shown in the Figures(e.g., flowcharts) may be implemented using code and data stored andexecuted on one or more electronic devices or nodes (e.g., a subscriberclient device or end station, a network element and/or a managementnode, etc.). Such electronic devices may store and communicate(internally and/or with other electronic devices over a network) codeand data using computer-readable media, such as non-transitorycomputer-readable storage media (e.g., magnetic disks, optical disks,random access memory, read-only memory, flash memory devices,phase-change memory, etc.), transitory computer-readable transmissionmedia (e.g., electrical, optical, acoustical or other form of propagatedsignals—such as carrier waves, infrared signals, digital signals), etc.In addition, such network elements may typically include a set of one ormore processors coupled to one or more other components, such as one ormore storage devices (e.g., non-transitory machine-readable storagemedia) as well as storage database(s), user input/output devices (e.g.,a keyboard, a touch screen, a pointing device, and/or a display), andnetwork connections for effectuating signaling and/or bearer mediatransmission. The coupling of the set of processors and other componentsmay be typically through one or more buses and bridges (also termed asbus controllers), arranged in any known (e.g., symmetric/sharedmultiprocessing) or heretofore unknown architectures. Thus, the storagedevice or component of a given electronic device or network element maybe configured to store code and/or data for execution on one or moreprocessors of that element, node or electronic device for purposes ofimplementing one or more techniques of the present disclosure.

Referring now to the drawings and more particularly to FIG. 1, depictedtherein is a generalized data center (DC) network environment 100wherein one or more embodiments of the present invention with respect tomanaging data center failures may be practiced. By way of illustration,a plurality of subscribers and/or associated end stations or userequipment (UE) devices 104-1 to 104-N are connected to a packet-switchednetwork 106 (e.g., the Internet) for accessing cloud-based resources,services and applications (e.g., data/media/video applications,storage/compute/network resources, etc.) disposed in one or more datacenters 116, 136. In one arrangement, the cloud-based data centers 116,136 may be provided as part of a public cloud, a private cloud, or ahybrid cloud, and for facilitating subscriber sessions with respect tothe consumption of one or more hosted services, applications and/orresources using any combination of end stations 104-1 to 104-N. Further,data centers 116/136 may be architected using Software DefinedNetworking (SDN) infrastructure and/or network function virtualizationarchitecture, e.g., operating with protocols such as, withoutlimitation, OpenFlow (OF) protocol, Forwarding and Control ElementSeparation (ForCES) protocol, OpenDaylight protocol, MultiprotocolBorder Gateway Protocol (MP-BGP), and the like, with respect toproviding overall management and control of various data center nodes,elements, functions, service instances, etc. An example SDN architecturetypically involves separation and decoupling of the control and dataforwarding planes of the network elements, whereby network intelligenceand state control may be logically centralized and the underlyingnetwork infrastructure is abstracted from the applications. Oneimplementation of an SDN-based network architecture may thereforecomprise a network-wide control platform, executing on or more servers,which is configured to oversee and control a plurality of dataforwarding nodes such as routers, switches, etc. Accordingly, astandardized interfacing may be provided between the network-widecontrol platform (which may be referred to as “SDN controller” 152 forpurposes of some embodiments of the present patent application) andvarious components of data centers 116, 136, thereby facilitating highscalability, flow-based traffic control, multi-tenancy and secureinfrastructure sharing, virtual overlay networking, efficient loadbalancing, and the like. In an example network environment, SDNcontroller 152 may be interfaced with suitable internal and/or externalmanagement layer entities such as, e.g. Operations Support Systems (OSS)and/or Business Support Systems (BSS), customer/subscriber policymanagement systems, content provider policy management systems, etc. Amanagement interface 164 between SDN controller 152 and such managementnodes 154 may facilitate service provisioning, Quality of Service (QoS)and Class of Service (CoS) requirements, etc., with respect to differentservices and subscriber flows.

As SDN-compatible environments, data centers 116, 136 may be implementedin an example embodiment as an open source cloud computing platform forpublic/private/hybrid cloud arrangements, e.g., using OpenStack andKernel-based Virtual Machine (KVM) virtualization schemes. Example datacenter virtualization may involve providing a virtual infrastructure forabstracting or virtualizing a large array of physical resources such ascompute resources (e.g., server farms based on blade systems), storageresources, and network/interface resources, wherein specialized softwarecalled a Virtual Machine Manager (VMM) or hypervisor allows sharing ofthe physical resources among one or more virtual machines (VMs) or guestmachines executing thereon. Each VM or guest machine may support its ownOS and one or more instances of services, applications, etc., and one ormore VMs may be logically organized into a Virtual Extensible LAN(VxLAN) using an overlay technology (e.g., a VLAN-like encapsulationtechnique to encapsulate MAC-based OSI Layer 2 Ethernet frames withinLayer 3 UDP packets) for achieving further scalability. By way ofillustration, data centers 116 and 136 are exemplified with respectivephysical resources 118, 138 and VMM or hypervisors 120, 140, andrespective plurality of VMs 126-1 to 126-N and 146-1 to 146-M that arelogically connected in respective VxLANs 124 and 144. As a furtherillustration, each VM may support one or more instances ofservices/applications, e.g., generally, application(s) 128 executing onVM 126-1 and application(s) 130 executing on VM 126-N in respect of datacenter 116 and application(s) 148 executing on VM 146-1 andapplication(s) 150 executing on VM 146-M in respect of data center 136.

Example data centers 116, 136 may also be provided with different typesof hypervisor deployment schemes depending on virtualization.Hypervisors 120, 140 may be deployed either as a Type I or “bare-metal”installation (wherein the hypervisor communicates directly with theunderlying host physical resources) or as a Type II or “hosted”installation (wherein the hypervisor may be loaded on top of an alreadylive/native OS that communicates with the host physical infrastructure).Regardless of the hypervisor implementation, each data center 116, 136may be provided with a plurality of SDN-compatible virtual switches orswitching nodes 122, 142 (e.g., OpenFlow vSwitch or OVS nodes in oneexample implementation, as noted further below) that may be configuredfor facilitating access to service instances with respect to the packetflows routed from a plurality of data center border nodes such asgateway routers 102-1 to 102-K via an internal network fabric 115, 135.In one arrangement, a data center's VS nodes may be deployed as part ofits hypervisor, e.g., as illustrated in data center 116. In anotherarrangement, VS nodes may be provided as part of respective VMs or guestmachines executing thereon, as illustrated in data center 136. Withrespect to the latter configuration, it should be appreciated that insome instances a VM's OS may or may not include the capability tosupport a virtual switch (or specifically an OVS), and accordingly, aVM-based OVS configuration may be somewhat constrained by the OScapabilities. Control paths 166A, 166B exemplify BGP-based messagingpaths between SDN controller 152 and gateway nodes 102-1 to 102-K,whereas control paths 168, 170 exemplify OF-based messaging pathsbetween SDN controller 152 and switching nodes of the data centerimplementations 116, 136, respectively, whose messaging functionalitieswill be further elucidated hereinbelow for purposes of one or moreembodiments of the present invention.

Regardless of a particular SDN/virtualization architecture, example datacenters 116, 136 may be organized based on a multi-layer hierarchicalnetwork model which may generally include three layers of hierarchy: acore layer (typically characterized by a high degree of redundancy andbandwidth capacity, optimized for high availability and performance), anaggregation layer that may be characterized by a high degree ofhigh-bandwidth port density capacity (optimized for traffic distributionand link fan-out capabilities to access layer switches, and an accesslayer serving to connect host/server nodes to the networkinfrastructure. In one embodiment, example nodes in an aggregation layermay be configured to serve functionally as a boundary layer between OSILayers 2 and 3 (i.e., an L2/L3 boundary) while the access layer elementsmay be configured to serve at L2 level (e.g., LANs or VLANs).

To illustrate various embodiments of the present invention, differentarchitectural implementations of an SDN-based data center may beabstracted as a simplified example data center arrangement 200 shown inFIG. 2, wherein a plurality of subscriber/tenant packet flows may berouted from border gateway nodes to one or more switching nodes usingrouting protocols such as, e.g., ECMP, that support load balancingwithin the data center's network fabric. Skilled artisans will recognizethat the data center 200 is a high level view of the data centerportions 116, 136 of the network environment 100 described above inreference to FIG. 1, and accordingly, at least parts of the foregoingdescription are equally applicable to an example implementation of thedata center 200, mutatis mutandis, with a particular focus on SDNdeployment as will be set forth in additional detail hereinbelow.

In one implementation, example data center arrangement 200 may beconfigured for extending tenant L3 networks all the way into virtualswitches, e.g., vSwitch nodes 206-1, 206-2, in order to overcome thescalability issue of traditional L2 only tenant networking in datacenters. A data center fabric 202 is disposed for interconnecting bordergateway nodes, e.g., DC-GW1 208-1 and DC-GW2 208-2 (generallyillustrative of a plurality of gateways of the data center) and vSwitchnodes 206-1, 206-2 (generally illustrative of a plurality of switchingnodes of the data center) that support various services/applications inmultiple instances, as exemplified hereinabove with respect to FIG. 1.Nodes 210-1 and 210-2 hosting/running the service instances arerespectively coupled to vSwitch nodes 206-1, 206-2. SDN controller 204is operative to control vSwitch nodes 206-1, 206-2 using OpenFlowprotocol control paths 252 in an example embodiment. Further, SDNcontroller 204 is configured to interface with gateway nodes 208-1,208-2 using MP-BGP that can allow different types of address families,e.g., IPv4 and IPv6 addresses as well as unicast and multicast variantsthereof. In addition, BGP protocol paths 250 may be advantageously usedto advertise bypass routes to gateway nodes for select packet flows withrespect to certain embodiments as will be set forth in additional detailbelow.

Broadly, in one arrangement, vSwitches 206-1, 206-2, may be implementedas part of one or more software application platforms that allowcommunication between various VMs, e.g., roughly similar to thearrangements exemplified in FIG. 1, although they may be embedded aspart of a server's hardware, e.g., as firmware. As a further variation,an Open vSwitch (OVS) implementation may involve providing a distributedvirtual multilayer switch platform that facilitates a switching stackwhile supporting multiple protocols and standards in a virtualizationenvironment. Example management interfaces and protocols operative in anOVS implementation may include but not limited to: NetFlow, sFlow, SPAN,RSPAN, CLI, LACP and 802.1ag, etc.

Whereas the example data center arrangement 200 may include vSwitches206-1, 206-2 acting as OpenFlow switches, it should be appreciated thatsuch an implementation is not necessary in some embodiments. Skilledartisans will recognize that even a Top of Rack (ToR) switch can beconfigured as an OpenFlow switch. Typically, in case of a vSwitch-basedimplementation, tenant/subscriber workloads (e.g.,services/applications) would be hosted inside VMs and suitablecontainers, whereas bare metal servers are connected with ToR switches.

Example data center arrangement 200 may employ protocols such as ECMPwith load balancing and gateway redundancy with respect to the incomingtraffic so as to improve scalability and reliability of the network.Various subscriber/tenant packet flows, illustrated as packet flowgroups 260, 262, entering from one or more external networks (e.g.,public networks, private networks, enterprise networks, extranets,intranets, Internet, and/or any combination thereof) via gateway nodes208-1, 208-2 may therefore be load balanced (e.g., by using hash-baseddistribution) wherein packets flows may be routed to different switchingnodes with a view to balancing the traffic as well as providing failoverof a node that may result in withdrawal/unavailability of a route.Similarly, to achieve scale, example data center arrangement 200 maydeploy multiple instances of a service, e.g., as illustrated in FIG. 1.Examples of such services could be applications terminating the trafficin the data center (e.g., web applications) or they can be L2/L3forwarding appliances themselves (e.g., a routing service involving oneor more vRouters hosted in a VM). In general, such a service may behosted on a public IP address, e.g., reachable from the Internet, whichmay be resolved into one or more internal IP addresses for accessing theservice instance(s) depending on how the data center infrastructure isimplemented. In operation, SDN controller 204 may be configured toannounce multiple equal cost routes for the service instance IP as theprefix with different nodes as next hops to the gateway nodes. Trafficto these service instances are fed by the gateway node receiving thepacket flow, which may be configured to make use of ECMP to spray thetraffic to the various service instances. Typically, these multipleroutes may be learnt by the gateway nodes, e.g., by interfacing with SDNcontroller 204 using MP-BGP.

For purposes of the present patent application, the terms “flow”,“packet flow”, or terms of similar import, can be thought of as a streamof packets, wherein substantially all packets belonging to a specificflow may have a set of common properties, states, or characteristics,etc. A property can be a result of applying a function to one or morepacket header fields (e.g., destination IP address), transport headerfields (e.g., destination port number), or application header fields(e.g., real-time protocol (RTP) header fields; one or morecharacteristics of the packet (e.g., number of multiprotocol labelswitching (MPLS) labels); or one or more fields derived from packettreatment (e.g., next hop IP address, output interface). As will be seenbelow, a packet flow may be identified by a unique n-tuple, e.g., a5-tuple, comprising, for instance, protocol being used, source InternetProtocol (IP) address, source port, destination IP address, anddestination port. A packet may be characterized as belonging to aparticular flow if it satisfies substantially all properties of thatflow. For example, packets with the same 5-tuple may belong to the sameflow. Furthermore, the concept of a packet flow can be defined broadly,e.g., a Transmission Control Protocol (TCP) connection, or all trafficfrom a particular Media Access Control (MAC) address or InternetProtocol (IP) address, or all packets with the same Virtual LAN (VLAN)tag, or all packets from the same switch port, or all traffic having oneor more user-defined control flags, as well as including any combinationof the foregoing conditionalities.

Turning to FIG. 3, depicted therein is an example load balancingscenario 300 in connection with data center 200 by way of illustration.Consider a service is hosted on IP {1.1.1.1}, which is located at twoVMs, Service Instance-1 and Service Instance-2, exemplified at nodes210-1, 210-2, respectively. Each of Gateway nodes 208-1, 208-2 isprovided with two equal cost routes to the service from SDN Controller204, one for each service instance. Consider DC-GW1 208-1 receivingtraffic on two n-tuple flows, F1 (e.g., DST IP={1.1.1.1}, SRCIP={2.2.2.2}, TCP DST port=80, TCP SRC port=5000) and F2 (DSTIP={1.1.1.1}, SRC IP={3.3.3.3}, TCP DST port=80, TCP SRC port=6000) fromtwo end points in the external network toward the service (exemplifiedas traffic 302). Using DC-GW's own load balancing mechanisms such as5-tuple hashing (over which SDN controller 204 has no control), flows F1306 and F2 308 are load balanced and routed via respective ECMP paths304 such that the packets from Flow F1 306 will be directed to ServiceInstance-1 210-1 and packets from Flow F2 306 is directed to ServiceInstance-2 210-2 under normal operation.

Whereas the load balancing as well as gateway redundancy andprovisioning of multiple instances of a service can be advantageous, adata center may still encounter failures with respect to certain typesof flows for which the foregoing features will not be adequate. Forexample, the service/application hosted at IP {1.1.1.1}might develop astate with respect to F1 (e.g., subscriber Packet Data Protocol or PDPcontext which contains the subscriber's session information when thesubscriber has an active session). For certain applications, thiscondition means the packets on F1 should always be sent to {1.1.1.1},referred to as flow stickiness or flow persistence requirement. In caseof ECMP-based routing, it can be seen that certain types of networkevents can easily disrupt flow stickiness, which can cause servicedisruption, degradation of quality, etc.

FIGS. 4-6 depict example failure scenarios in the data centerarrangement 200 of FIG. 2 that may be overcome according to one or moreembodiments of the present invention. In the example scenario 400 ofFIG. 4, a flow re-routing event is illustrated that may be caused dueto, e.g., a gateway failure. Consider where DC-GW1 208-1 fails for somereason and flows F1 and F2 fail over to DC-GW2 208-2. Whereas thissecond gateway node, DC-GW2 208-2, also has its equal cost routes to theservice, which are the same as the equal cost routes provided to thefirst gateway node (DC-GW1 208-1), the load balancing mechanisms used bythe two nodes can be completely different. Consequently, it is possiblethat DC-GW2 208-2 directs incoming traffic 402 such that F1 flow 406 isrouted to vSwitch 206-2 coupled to Service Instance-2 206-2 (instead ofvSwitch 206-1 coupled to Service Instance-1 206-1 as was done by DC-GW1208-1 prior to failover) while F2 flow 408 is directed to vSwitch 206-1for Service Instance-1 206-1, as part of ECMP routing 404. Assuming F1requires flow stickiness, e.g., due to the particular subscriber'sservice level agreement), re-routing of F1 from vSwitch 206-1 to vSwitch206-2 can cause undesirable negative effects on the service quality.

In the example scenario 500 shown in FIG. 5, a similar flow re-routingevent is illustrated that may be caused due to, e.g., unreachability ofa particular gateway because of an external link failure, etc. Asubscriber end point host/device 502 is operative to reach DC-GW1 208-1pursuant to a subscriber session via a network path 506A effectuated inan external network 504. Similar to the example scenario in FIG. 4, thissubscriber packet flow is referred to as F1 flow, which is routed tovSwitch 206-1 coupled to Service Instance-1 prior to encountering a linkfailure in this path. Whereas this flow may now enter the data centervia the second gateway node DC-GW2 208-2 (e.g., due to a link failoverdetection), it may be routed to vSwitch 206-2 coupled to ServiceInstance-2 pursuant to the load balancing of DC-GW2 208-2, asexemplified as F1 flow 510. It should be noted that other flows enteringDC-GW1 208-1 on different links may not be reset to other gateway nodesin this case. Accordingly, F2 flow 508 (e.g., with respect to adifferent subscriber's session and on a different network linkconnection) may continue to be routed to vSwitch 206-2 coupled toService Instance-2 due to the load balancing of DC-GW1 208-1 as before.

Turning to FIG. 6, example scenario 600 depicted therein illustrative ofa flow re-routing event is illustrated that may be caused due to, e.g.,intra-DC network events. Typically, a DC environment may be implementedwherein ECMP may be coupled with next-hop path monitoring protocols inorder to facilitate, e.g., quick traffic redirection when multiple pathsare available. For example, the DC-GWs may be configured to employBidirectional Forwarding Detection (BFD) to monitor intra-DC path statusbetween the DC-GWs and the vSwitches. In some arrangements, BFD may beemployed even to monitor vSwitch⇔vSwitch path monitoring. Although therecan be sufficient link redundancy within such a DC environment, pathconvergence after network failures might take some time. For example, inan L3 DC fabric this convergence time is dictated by the underlayfabric's routing protocol convergence times. Accordingly, in such ascenario, a BFD “DOWN” event can result in subscriber/tenant trafficre-direction. In the example scenario 600, for instance, inter-nodecommunication link 602 between DC-GW1 208-1 and vSwitch node 206-1 maybecome unavailable (e.g., due to physical, electrical, or softwareconditions), which may be detected by DC-GW1 208-1 (with a BFDmechanism). Upon detecting that its connectivity with vSwitch connectedto Service Instance-1 is broken, DC-GW1 208-1 may be configured toredirect flow F1 606 toward vSwitch 206-2 coupled to Service Instance-2210-2 via an alternative inter-node link 604. Skilled artisans willrecognize that this failure scenario is different from the failurescenarios set forth above as DC-GW2 208-2 is not involved in theredirection of the affected flow (i.e., Flow F1). Further, the routingpath for F2 flow 608 from DC-GW1 208-1 to vSwitch node 206-2 remainsunaffected because it uses the functioning inter-node link 604.

Since Service Instance-2 210-2 does not have or maintain anystate/context associated with F1 flow (and vice versa) in the scenariosabove, the corresponding connection may be reset, thereby forcing an endpoint to restart the connection and begin transmitting traffic again. Itshould be appreciated that in DC environments having a large number ofsuch sticky flows at any given moment, the impact of flow re-routing inthe foregoing scenarios due to failures can be substantial.

FIG. 7 is a flowchart illustrative of various blocks, steps and/or actsof a method 700 operating at an SDN-based data center (e.g., data center116, 136, 200) that may be (re)combined in one or more arrangements,with or without blocks, steps and/or acts of additional flowcharts ofthe present disclosure. At block 702, a plurality of switching nodes ofthe data center are programmed by an SDN controller associated therewith(e.g., SDN controller 152, 204) in order to facilitate learning of newpacket flows into the data center. As will be set forth below, theswitching nodes of the data center may be instructed to forward (or,“punt”) the very first packet of a new flow (i.e., a packet flow havinga new n-tuple identifier set) received by the switching nodes to the SDNcontroller. In one example implementation, the n-tuple identifier setmay comprise a 5-tuple that specifies a source IP address, a source portnumber, a destination IP address, a destination port number and aprotocol in use, although other examples of n-tuples having suitablepieces of identifier data (e.g., 2-tuple, 3-tuple, etc.) may also beused in additional/alternative embodiments. At block 704, the SDNcontroller may learn, ascertain, determine, identify or otherwise detectthat a subscriber session (and by extension, its associated packet flow)has or otherwise requires flow stickiness or persistence. In oneembodiment, the SDN controller may receive a subscriber service policymessage (e.g., from an OSS/BSS management node, customer policymanagement node, etc.) that a particular subscriber session requiresflow stickiness (e.g., due to a service level agreement). In anotherarrangement, the SDN controller may perform a statistical analysis of anexisting flow and determine that the packet flow requires flowstickiness (e.g., due to the observation that the packets of aparticular flow have been routed to a specific switching node associatedwith a certain service instance over a select time period, therebyestablishing or acquiring a subscriber PDP context or state). Regardlessof how a packet flow, either new or existing, is determined to acquireor require flow stickiness, the SDN controller may be advantageouslyconfigured to generate various commands, instructions, or messages tothe switching nodes and/or the gateway nodes to avoid the ECMP routesnormally used by the gateway nodes with respect to such sticky flowseven when one or more failure conditions such as the example failurescenarios set forth above have been observed in the data center (block706). In additional/alternative embodiments, the SDN controller may alsobe advantageously configured to generate various commands, instructions,or messages only to the switching nodes of the data center to redirectthe packets of sticky flows received thereat (e.g., due to normal ECMProuting) to one or more other switching nodes that were associated withthe sticky flows prior to observing a failure condition in the datacenter (block 706).

The foregoing embodiments will now be described in further detail belowby taking reference to the remaining Figures of the present application,wherein the processes set forth in one or more blocks of FIG. 7 may befurther illustrated in conjunction with additional Figures. It will beseen that common to the foregoing embodiments is a programming/learningphase where the SDN controller learns what new flows are coming into thedata center and/or if any new flows (or existing flows) require/acquireflow persistence or stickiness characteristics.

One or more embodiments set forth herein advantageously leverage thefact that an SDN controller (e.g., SDN controller 152, 204) operatingaccording to the OpenFlow (OF) specification may be configured todisseminate very granular n-tuple specific information (e.g., 5-tuplesas set forth above) into applicable network protocols used with respectto the border nodes (e.g., gateway nodes operative with MultiprotocolExtensions for Border Gateway Protocol (BGP), sometimes also referred toas Multiprotocol BGP or Multicast BGP, MP-BGP for short) in order tocontrol, modify or otherwise influence the gateway nodes' packet routingbehavior for select packet flows. Although the SDN controller cannotinfluence a load balancing algorithm executing in the DC-GW nodes, itcan be configured to learn the n-tuple flows that are directed towardsthe switching nodes by a DC-GW. To accomplish this, a switching node ofthe data center (e.g., an OF switch such as vSwitch1 206-1 in FIG. 2)may be programmed in an embodiment of the present invention by the SDNcontroller to forward punt only the very first packet on a new 5-tuplearriving from a DC-GW to the controller, as previously set forth. Takingreference to FIGS. 8A-8C, flowcharts set forth therein are illustrativeof various blocks, steps and/or acts that may be (re)combined in one ormore arrangements, with or without blocks, steps and/or acts ofadditional flowcharts of the present disclosure. Process 800A of FIG. 8Asets forth an embodiment wherein gateway nodes of a data center areprovided with suitable alternative routing paths advertised by an SDNcontroller with respect to sticky flows identified by respectiven-tuples. At block 802, the SDN controller generates appropriateinstructions (e.g., OF-compliant instructions) to a plurality ofswitching nodes of the data center to forward first packets of new flowscoming from various gateway nodes pursuant to various subscribersessions involving one or more external networks. Responsive thereto, afirst packet of a packet flow is received from a first switching nodecoupled to a service instance of a service hosted by the data center,wherein the first packet is identified by an n-tuple flow specification,the packet flow flowing from a DC gateway connected to the externalnetwork (block 804). At block 806, a determination is made that thepacket flow requires or has acquired flow stickiness characteristics. Asset forth previously in reference to FIG. 7, a variety of mechanismsinvolving operator policies, service provider policiessubscriber-specific policies, etc. as well as statistical analysis ofnetwork flows within the data center may be used in respect of suchdeterminations. Responsive to the determination that a packet flowrequires or has acquired flow stickiness characteristics, bypass oralternate routing paths may be computed by the SDN controller withrespect to the packet flow, wherein the bypass routing paths have aswitching node destination identified as the first switching node andare associated with the n-tuple specification of the packet flow.Further, the bypass routing paths are advertised by the SDN controllerto each of the DC gateways via a suitable BGP-based mechanism, as setforth at block 808. In one example implementation, the SDN controlleruses BGP Flow Specification (flowspec) (BGP-FS) to provide the DCgateway nodes with the alternate routing path information regarding thepacket flow (e.g., including the first switching node as thedestination, the n-tuple flow specification, as well as next hopinformation, and the like). BGP Flow Specification, set forth as RFC5575, incorporated by reference herein (seehttps://tools.ietf.org/html/rfc5575), may be implemented in accordancewith the teachings herein to specify packet fields as prefixes (e.g.,equivalent to Access Control Lists or ACLs), as opposed to traditionalMP-BGP which uses matches of only destination IPs in the packet. As willbe seen in further detail below, this alternate routing path informationmay be utilized by the DC gateway nodes by implementing a suitablepacket pipeline mechanism for avoiding conventional ECMP routing (i.e.,bypassing ECMP hits) with respect to any sticky packet flows arrivingfrom external networks.

In an additional or optional variation, a determination may be madewhether the sticky packet flow session has terminated, timed out, orotherwise inactivated (block 810). Responsive thereto, the SDNcontroller may provide suitable instructions, commands, or controlmessages to the DC gateways to inactivate and/or delete the bypassrouting paths advertised via BGP-FS (block 812).

Process 800B shown in FIG. 8B illustrates an embodiment of a packetprocessing method at a DC gateway node (e.g., DC-GW 208-1, 208-2). Atblock 822, the DC gateway nodes receive bypass routing paths from an SDNcontroller (e.g., SDN controller 152, 204) via BGP-FS for select packetflows having flow stickiness requirements and/or characteristics. Atblock 824, a suitable routing data structure, referred to herein as amatching n-tuple Forward Information Base (FIB) or FlowSpec FIB (FSFIB), is populated with the bypass routing paths including selectn-tuple flow identifier sets and specific switching node destinationscorresponding thereto. In accordance with the teachings of the presentinvention, the matching FIB is disposed in a gateway packet pipeline(set forth in additional detail hereinbelow) in relation to a LongestPrefix Match (LPM) Forwarding Base (FIB) used for determining ECMProuting paths to one or more switching nodes of the data center.Further, in an example embodiment, the bypass routing paths provided viaBGS-FS are accorded a higher priority than ECMP-based routing paths.When packets of a particular packet flow arrive at the DC gatewaypursuant to a subscriber session via an external network, the particularpacket flow having a particular n-tuple identifier set, a determinationmay be made if the particular n-tuple identifier set of the particularpacket flow matches an entry in the matching FIB, as set forth at blocks826 and 828. If so, responsive to the determining that the n-tuple ofthe packet flow has a hit in the matching FIB, the packets of theparticular packet flow may be forwarded to a corresponding switchingnode identified in the bypass routing path associated with theparticular n-tuple identifier set instead of routing the particularpacket flow according to an ECMP path computation (block 830). In afurther variation, a determination may be made and/or an instruction maybe obtained from the SDN controller that the particular packet flowsession has been terminated, timed out or otherwise inactivated.Responsive thereto, the DC gateway node may be configured to delete then-tuple flow entry or record of that packet flow from the matching FIB,as set forth at block 832.

Process 800C depicted in FIG. 8C, which may be augmented within some ofthe example processes set forth above in additional/alternativeembodiments, is illustrative of a scenario when a packet flow receivedat a DC gateway node does not match the records in a DC gateway'sn-tuple FS FIB. At block 852, a determination is made that the n-tupleset of the packet flow does not have an entry (i.e., no hit) in thematching FS-FIB. Responsive thereto, the packets are forwarded to an LPMFIB structure provided as part of the gateway's packet pipeline, as setforth at block 854. As ECMP is used in computing paths with respect tothe LPM FIB (e.g., employing a hash-based distribution for loadbalancing), the packet flow not having a hit in the matching FIB may berouted to a next hop toward any switching node as per the ECMP pathcomputation (block 856). In one arrangement, a hash-based load balancingmay involve hashing on any portion of an n-tuple flow specification set,e.g., source IP address, source port, destination IP address,destination port, and protocol type, to map traffic to availableservers/switches. As such, there is no requirement of a “stateful ECMP”on the IP addresses for traffic processed via LPM FIB forwarding path inan example embodiment of the present invention.

FIG. 9 is flowchart illustrative of a process 900 for facilitatinglearning of new flows in a data center according to an embodiment of thepresent invention. As noted elsewhere in the patent application, variousblocks, steps and/or acts of process 900 may be (re)combined and/orreplaced in one or more arrangements, with or without blocks, stepsand/or acts of additional flowcharts set forth herein. At block 902, aswitching node receives packets from one or more gateway nodes withrespect to one or more packet flows, each having corresponding n-tuplesets. When a packet of a packet flow arrives, the corresponding n-tupleis matched against a flow identification match database (block 904). Ifit is determined that the packet is a new flow, i.e., the n-tuple doesnot have a hit in the flow identification match database, the packet isa first packet and a copy of that packet is forwarded to the SDNcontroller of the data center, as set forth at block 906. Further, theflow identification match database is updated or otherwise populated byadding the n-tuple of the new flow (block 908). In one arrangement, suchupdating and/or populating a flow identification match database at aswitching node may be effectuated responsive to OF-based instructionsfrom the SDN controller as described elsewhere in the present patentapplication. Suitable mechanisms may also be provided for non-OFswitching nodes for creating and populating appropriate databases,tables or data structures in order to facilitate identification of newflows to a data center controller in additional or alternativeembodiments of the present invention. In a further variation, adetermination may be made and/or an instruction may be obtained from theSDN controller that a packet flow session has been terminated, timed outor otherwise inactivated. Responsive thereto, the DC switching node maybe configured to delete the n-tuple flow entry or record of that packetflow from the flow identification match database, as set forth at block910. Skilled artisans will further appreciate that at least a portion ofthe foregoing acts or blocks may be implemented as programming andlearning phases set forth above in reference to FIGS. 7 and 8A.

FIGS. 10A and 10B depict example pipeline structures and associatedcomponents of a switching node and a gateway node, respectively,according to an embodiment of the present invention. Reference numeral1000A in FIG. 10A generally refers to a portion of a data centerenvironment (such as, e.g., data center 200 in FIG. 2) that exemplifiesa switching node packet pipeline 1002, which includes an n-tuple flowidentification database, table or other suitable data structure 1004that may be programmed and/or populated for purposes of learning newflows in the data center. Packets arriving from the DC gateway nodes forvarious packet flows, e.g. packet flow 1006, may be run through then-tuple flow identification database 1004 for interrogation. In oneexample implementation, the n-tuple may comprise a 5-tuple flowspecification that includes a source IP address (src-IP), a destinationIP address (dst-IP), protocol, source port address (src-port), anddestination port address (dst-port), although acombination/subcombination thereof as well as other indicia may be usedadditional/alternative arrangements, as previously noted. Any table missentry 1005 indicates that it is a new packet flow with a correspondingnew n-tuple, and a copy of that packet may be punted to SDN controller204 via path 1008. In one example embodiment, depending on determiningthat the packet flow requires flow stickiness, SDN controller 204 may beconfigured to instruct the switching node to add the new 5/n-tuple backinto the flow identification database 1004, as indicated by programmingpath 1010. Skilled artisans will recognize that by adding the n-tupleback into the flow identification database 1004, a switching node may beprevented from further punting of packets from the switching node to thecontroller. However, in some embodiments there can still be a flurry ofpacket punts to SDN controller 204. Accordingly, in an additional oralternative implementation, such a situation may be avoided byconfiguring the switching node to add an n-tuple entry into the flowidentification database 1004 on its own (e.g., without having to receiveinstructions from SDN controller 204), preferably with a timingthreshold value, such as an IDLE_TIMER value, aging timer, etc., whichmay be individually configured for different packet flows. Although asingle IDLE_TIMER block 1020 associated with the flow identificationdatabase 1004 is shown in FIG. 10A, it will be apparent that each tableentry or record may include a corresponding timer value in someembodiments. Furthermore, similar timing mechanisms may also be providedto age out stale n-tuple entries corresponding to completed sessions.

If there is a match found in the flow identification database 1004,e.g., as exemplified by match entry record 1011, the packet may proceedto further processing, e.g., delivered to the service instance, asexemplified by path 1012.

Responsive to determining that a new flow or an existing flow requiresor has acquired flow stickiness, SDN controller 204 is operative in oneembodiment to provide alternate bypass routes or updates 1014 usingBGP-FS to one or more gateway nodes in order to facilitate avoidance ofconventional ECMP-based routing paths as set forth hereinabove. In onearrangement, once the controller identifies the n-tuple flow as being asticky flow, it advertises the n-tuple with the Next-Hop (NH) configuredto the switching node having the state/session stickiness with respectto that flow. The route advertised via BGP-FS for the flow to thegateways is identical such that any gateway receiving that packet flowwill use the route to forward the packets to the specific switchingnode. In another arrangement, SDN controller 204 may be configured toprovide suitable match-action rules 1016 to the DC switching nodes thatidentify specific switching node destinations for the sticky flows inorder to facilitate redirection of such sticky flows to the appropriateswitching nodes as will be described in additional detail further below.It should be appreciated that these two embodiments may be practicedindependently or jointly in some arrangements.

In an implementation involving OVS architecture, an embodiment of thepresent invention may be configured to employ OpenFlow Niciraextensions, seehttp://docs.openvswitch.org/en/latest/tutorials/ovs-advanced/,incorporated by reference herein, for facilitating the addition of aflow entry into the flow identification database 1004, e.g., withrespect to a new flow. By way of illustration, a NXAST_LEARN action maybe provided as a new action for facilitating flow entries into adatabase in an OVS implementation. In a non-OVS implementation, anembodiment of the present invention may be configured to provide an OFswitch operative to support an equivalent OF extension.

Since a DC switching node of the present invention may be configured toadd an n-tuple match entry for a packet flow into the flowidentification database 1004 just prior to punting the first packet toSDN controller 204, the subsequent packets of the flow are not punted tothe controller as they would not miss the entry match (i.e., a hit isfound on the subsequent packets). In one arrangement, SDN controller 204may be configured to maintain a cache of ongoing flows in the datacenter network. Upon receiving a punted packet, SDN controller isoperative to extract the n-tuple information from the packet, update thecache, and provide applicable BGS-FS routing updates and/ormatch-actions rules to the various DC nodes.

Turning to FIG. 10B, depicted therein is an example packet pipelinestructure 1000B of a DC gateway node for purposes of an embodiment ofthe present invention. An n-tuple match Forward Information Base (FIB)1054 may be provided as part of the packet pipeline structure 1000B,which may be populated and/or otherwise programmed with the bypass pathrouting information for select packet flows (e.g., packet flowsdetermined to be sticky) advertised via BGP-FS by the SDN controller. Asnoted elsewhere, the bypass routing path information may be comprisedof, e.g., n-tuple, destination switching node, next hop reachability,etc. In one arrangement, the n-tuple match FIB 1054 may be installed inthe packet pipeline 1000B with a higher priority than conventional ECMProuting mechanism(s) that may be already present in the gateway node,which may utilize an Longest Prefix Match (LPM) Forwarding Base (FIB)1058 for packet routing. When packets of a packet flow 1052 arrive atthe gateway node, the n-tuple of the packet flow is interrogated againstthe n-tuple matching FIB 1054. If there is a hit, a corresponding BGP-FSbypass routing path is used for forwarding the packets to a Next-Hoptoward the particular switching node as destination, exemplified inblock 1056. If there a miss in the matching FIB 1054, the packets aresubjected to the conventional ECMP-based routing, e.g., by utilizing LPMFIB 1058 in conjunction with an ECMP load balancing module 1060 forrouting the packets across one or more available switching nodes (i.e.,stateless spraying).

Because the BGP-FS based routing paths are identical in all DC gatewaynodes, failure of one node will not impact the forwarding behavior forthe sticky flows. Accordingly, ongoing sticky flows will continue to beforwarded to the same service instance regardless of the gateway(s)forwarding the packets for other flows using conventional ECMPmechanisms. One skilled in the art will appreciate that this bypassforwarding behavior advantageously ensures that the sticky sessions willnot be terminated in a data center of the present invention if certainfailure modes such as one or more of the failure scenarios exemplifiedabove in reference to FIGS. 4-6 are encountered.

Skilled artisans will further recognize that the foregoing gateway nodepacket pipeline structure 1000B is illustrative rather than limiting anda number of variations may be implemented in accordance with theteachings herein as long as a logical pipeline may be effectuated withthe bypass routes having a higher priority than any conventional ECMPbased routing mechanisms. Example embodiments may be configured suchthat implementing BGS-FS does not alter ECMP behavior of the borderrouters such as gateway nodes. In a typical embodiment, ECMP routingmechanisms may continue to be stateless even with BGP-FS routes, whereasBGP-FS route based matches are handled in separate tables as ACLmatches. Further, routing databases associated with example gatewaynodes' respective packet pipelines may be provided as part of a virtualrouting and forwarding (VRF) arrangement that can allow differentrouting FIB s as separate routing instances.

FIGS. 11A and 11B are flowcharts illustrative of various blocks, stepsand/or acts that may be (re)combined or replaced in one or morearrangements, with or without blocks, steps and/or acts of additionalflowcharts of the present disclosure with respect to embodimentsrelating to redirection of sticky packet flows by switching nodes of adata center. Process 1100A set forth in FIG. 11A is illustrative of aset of steps, blocks and/or acts that may take place at an SDNcontroller of the data center, at least some of which pertain to alearning/programming phase of the SDN controller similar to theembodiments set forth hereinabove. Accordingly, skilled artisans willrecognize that the acts set forth at blocks 1102, 1104, 1106 aresubstantially identical to the acts set forth at blocks 802, 804, 806 ofFIG. 8A, whose description is equally applicable here, mutatis mutandis.At block 1108, responsive to the determination that a packet flowrequires or has acquired flow stickiness characteristics, the SDNcontroller is operative to generate programming instructions to eachswitching node (preferably other than the switching node punting thefirst packet, which may be referred to as a first switching node) of thedata center disposed on ECMP paths to redirect any packets of the packetflow arriving at the switching nodes to redirect the packets to thefirst switching node. In one example embodiment, suitable OF-basedmatch-action rules with respect to the sticky flows may be provided tothe switching nodes, as noted elsewhere in the present patentapplication. Upon determining that the packet flow session for a stickyflow has been terminated, further instructions may be provided to theswitching nodes to delete, inactivate or otherwise disable thematch-action rules, as set forth at blocks 1110 and 1112.

Process 1100B of FIG. 11B is illustrative of various steps, acts and/orblocks operating at a switching node, which may be augmented with othersteps, acts and/or blocks set forth above in certain embodiments. Atblock 1122, a switching node forwards first packets of new packet flowsreceived at the switching node to the SDN controller, the first packetseach containing respective n-tuple identifier sets of correspondingpacket flows and flowing from one or more gateway nodes pursuant torespective subscriber sessions via an external network. At block 1124, aflow identification database may be populated and/or otherwiseprogrammed with the n-tuples data for identifying the new packet flowsas they arrive at the switching node. As noted previously, one or moreof the foregoing acts may be performed by a switching node responsive toreceiving instructions from the SDN controller or may be performed onits own pursuant to the switching node's configuration. At block 1126,packets are received at a switching node (i.e., a packet receivingswitching node or a first switching node) for a particular packet flowhaving an n-tuple set, which may be interrogated against the flowidentification database for a match (block 1128). Responsive todetermining that the packet flow is not a new flow (e.g., the n-tupleset of the packets matches an entry, i.e., a hit, in the flowidentification match database), one or more match-action rules providedby the SDN controller may be applied against the packet flow, whereinthe match-action rules are operative to identify a destination DCswitching node other than the packet receiving switching node (block1130). The packet receiving switching node may then redirect the packetflow to the destination DC switching node identified in accordance withthe match-action rules (block 1132). In a further variation, adetermination may be made and/or an instruction may be obtained from theSDN controller that a packet flow session has been terminated, timed outor otherwise inactivated. Responsive thereto, the DC switching node maybe configured to delete the n-tuple flow entry or record of that packetflow from the flow identification match database, as set forth at block910 of FIG. 9, which may be augmented within the flowcharts of FIGS. 11Aand 11B.

FIG. 12 depicts an example packet structure and associated match-actionrule illustrative for purposes of an embodiment of the presentinvention. Reference numeral 1200 refers to an example packet includingappropriate L2/L3/L4 header fields 1202, 1204, 1206, respectively, and apacket payload portion 1208, which may contain any data relating to asubscriber session, e.g., bearer, media, and/or control signaling data.One or more additional structures may be maintained in the packetprocessing pipeline of a switching node that may contain informationregarding packet metadata 1210 as well as a set of actions 1220 to beapplied to a packet. By way of illustration, physical port ID 1212,logical port ID 1214, other metadata 1216 as well as n-tuple data 1218may comprise packet metadata 1210. Applicable actions rules 1220 maycomprise actions for discarding, modifying, queuing, or forwarding thepacket to a particular destination, which are exemplified as actions1222-1 to 1222-X in FIG. 12. Depending on whether version 1.0 orsubsequent versions (e.g., ver. 1.1.0) of the OpenFlow protocol is used,an embodiment of the present invention may involve using a FlowModmessage, or a variation thereof with respect to providing suitablematch-action rules as well as extensions as previously noted.

FIGS. 13-15 depict example solution scenarios in the data center of FIG.2 according to one or more embodiments of the present invention,particularly in relation to the failure scenarios of FIGS. 4-6.Analogous to the failure scenario in FIG. 4, the solution scenario 1300of FIG. 13 illustrates a situation where DC-GW1 208-1 has failed and thepacket traffic flows have failed over to DC-GW2 208-2. Flow F1 1302 isnow routed from DC-GW2 208-2 to vSwitch 206-2 pursuant to the ECMProuting of DC-GW2 208-2. When the packets arrive at vSwitch 206-2, theyare matched against an action rule (e.g., a redirection of packets)provided by SDN controller 204 for the n-tuple of F1 flow, whichindicates that the packets should be redirected to vSwitch 206-1, asshown by the redirection flow 1304. As there is no mach-action for FlowF2 408, it remains unaffected in the illustrative scenario. A similarredirection flow for F1 will be obtained where the external link toDC-GW1 208-1 for F1 flow is compromised in some fashion, similar to thefailure scenario 500 shown in FIG. 5.

In the solution scenario 1400 shown in FIG. 14, inter-nodal path 602,which may also be referred to as intra-DC path, between vSwitch 206-1and DC-GW1 208-1 experiences a failure, similar to the failure scenario600 shown in FIG. 6. Whereas Flow F1 1404 is routed to vSwitch 206-2 viaan alternative inter-nodal path 604, the programmed match-action rulesat vSwitch 206-2 for the n-tuple corresponding to F1 are operative forredirecting the flow to vSwitch 206-1 similar to the foregoingscenarios, as exemplified by redirection flow 1406.

Redirection of flows that have/require stickiness as set forth aboveresults in what may be referred to as a “Two-Hop” situation, in thesense that the leaf vSwitch nodes are required to add an extra “hop” tothe “correct” switch (i.e., vSwitch that has acquired a PDP context fora flow). As there may be many intermediary nodes, such as otherswitches, routers, bridges, brouters, etc, along the intra-DC paths, a“Hop” in this context simply means another vSwitch⇔vSwitch segment thatresults from the match-action rule application, not necessarily anactual “hop” to the nearest node along the path.

Although the Two-Hop solution embodiments set forth herein canadvantageously address the flow stickiness under various failurescenarios, they may engender additional latency and/or congestion in anexample DC environment because of the extra hops. With respect to thesituation where an inter-nodal link failure is encountered, the latencyand/or congestion issues may not last long and/or the impact may not besevere since the inter-nodal links can undergo self-correction orannealing (e.g., because the underlying fabric routing protocol mayconverge relatively quickly). On the other hand, when a gateway failuresituation is encountered, the latency and/or congestion can becomesignificant in an example DC environment because no self-annealing ofthe network fabric is involved. To address this situation, embodimentsrelating to direct forwarding, e.g., as set forth in FIGS. 8A-8C basedon bypass routes advertised via BGP-FS, can be particularly advantageoussince they “short-circuit” the extra hop and instead forward the packetsfor a sticky flow directly to the correct vSwitch node as identified byan n-tuple hit the gateway's n-tuple match FIB (i.e., FlowSpec FIB). Inthe example solution scenario 1500 shown in FIG. 15, DC-GW1 208-1 hasfailed and the packet traffic flows have failed over to DC-GW2 208-2.Flow F1 1502 is now routed from DC-GW2 208-2 to vSwitch 206-2 pursuantto the BGS-FS path advertised by SDN controller 204 instead of usingconventional ECMP routing of DC-GW2 208-2. As hits in ECMP FIBs areavoided in such a scenario, this type of routing may be referred to as“ECMP hitless” routing. As before, Flow F2 408 remains unaffected sincethe conventional ECMP routing continues to be used for this flow.

Regardless of whether direct forwarding by gateway nodes or redirectionby switching nodes is implemented, an example embodiment of the presentinvention may include certain mechanisms for removing stale flowentries, as previously noted in reference to FIGS. 8A-8C as well asFIGS. 11A and 11B. Once a flow terminates, there would be no furtherpackets that match the corresponding flow entry, and eventually an IDLEtimer will expire. A switching node, in response, may delete the flowentry and generate a notification to an SDN controller in an exampleembodiment. Responsive to a flow entry deletion notification or message,the SDN controller may proceed to remove the corresponding flow in itscache. Furthermore, the SDN controller may also withdraw or inactivatethe corresponding n-tuple route advertised to the gateway nodes usingBGP-FS, as set forth hereinabove. Depending on whether direct forwardingby gateway nodes or redirection by switching nodes, or a combination ofboth, is implemented in an example DC environment, suitable timermechanisms, stale entry deletion procedures and/or inactivation ofBGP-FS routes may be accordingly configured in an embodiment.

In the example embodiments set forth herein, a DC switching node isconfigured to forward a packet from every new flow to the associated SDNcontroller. As the networks scale, processing of every new flow cancreate challenges at the controller. Accordingly, additional and/oralternative variations may be implemented in accordance with theteachings of the present patent application to address potentialscalability challenges. It should be noted that the impact of DC-GWfailures may not be critical for all flows. For example, certain UDPflows need not require being directed to the same service instance. Byproviding a list of services for which the (dis)connection is critical(e.g., a white list or a black list), an SDN controller may beconfigured to selectively enable one or more of the features of thepresent invention for a minor subset of flows. By reducing the number ofpunt packets to the controller, it is possible to scale an embodiment ofthe present invention to larger DC networks.

Further, session termination due to DC-GW failures particularlypenalizes those flows that already have sent substantial amount of data(as opposed to flows that have sent only a small amount of traffic).Accordingly, it is more important to ensure that the session does notterminate for such large flows. To that end, an embodiment may providethat instead of punting packet from every new flow to the controller,the learning feature will only add a new entry as discussed hereinaboveto certain flows. Subsequent packets will be forwarded based on the newentry. Thereafter, the switch can send a notification to the controlleronly if the flow counter exceeds a preconfigured value (e.g., greaterthan MB). In other words, flow session is maintained during a DC-GWfailure only for those flows that have already sent 1 MB. An exampleembodiment may, therefore, utilize a statistics-based trigger in theOpenFlow specification (referred to as OFPIT_STAT_TRIGGER, supported inOF ver. 1.5) to further modulate the packet punting behavior of a DCswitching node. Given that not all flows transmit 1 MB, the number ofpunt packets received by the controller can be substantially reduced insuch an arrangement, leading to increased scalability in largernetworks.

Skilled artisans will appreciate upon reference hereto that exampleembodiments of the present invention advantageously overcome variousshortcomings and deficiencies of the existing DC management solutions.For example, where symmetric DC-GWs that use identical hashingmechanisms are provided (so as to ensure consistency of the ECMPforwarding behavior of a gateway to which the packet flows are failedover), such hashing mechanisms for ECMP are often proprietary, leadingto a closed ecosystem of data centers. Further, symmetric hardware mayrequire the data center to be set up with DC-GWs from the same vendorand may further require the hardware and software versions on the twoDC-GWs to be identical. This would further reduce interoperability,giving rise to significant impediments in organic expansion of DCnetworks.

Another existing solution involves service instance TCP state mirroringa DC environment, where the TCP state of all service instances ismirrored. Consequently, if a flow is redirected to another serviceinstance, it would have the necessary state information to process thepacket and proceed further. However, this arrangement poses severeconstraints on scalability as more service instances are deployed, thenumber of mirroring sessions will increase dramatically (e.g.,quadratically). Also, such an arrangement is not an efficient solutionsince every service that is deployed in the data center is mirroredwhile not all flows require state consistency. Yet another existingsolution involves using load balancing as a service (LBasS), where astateful load balancer in cloud-centric deployments for load balancingsticky flows. However, it should be noted that the load balancer itselfintroduces an additional hop on the data plane, which may not bedesirable in latency sensitive applications such as NFV. Additionally,the load balancer itself might have multiple instances, and spraying thepackets across multiple instances of the load balancer itself relies onECMP (effectively adding the same problem on another level).

Turning to FIG. 16, depicted therein is a network functionvirtualization (NFV) architecture 1600 that may be applied inconjunction with a data center deployment of the present invention,e.g., similar to the embodiments set forth in FIGS. 1 and 2. Variousphysical resources and services executing in an example DC environmentmay be provided as virtual appliances wherein the resources and servicefunctions are virtualized into suitable virtual network functions (VNFs)via a virtualization layer 1610. Resources 1602 comprising computeresources 1604, memory resources 1606, and network infrastructureresources 1608 are virtualized into corresponding virtual resources 1612wherein virtual compute resources 1614, virtual memory resources 1616and virtual network resources 1618 are collectively operative to supporta VNF layer 1620 including a plurality of VNFs 1622-1 to 1622-N, whichmay be managed by respective element management systems (EMS) 1623-1 to1623-N. Virtualization layer 1610 (also sometimes referred to as virtualmachine monitor (VMM) or “hypervisor”, similar to hypervisors 120 and140 in FIG. 1) together with the physical resources 1602 and virtualresources 1612 may be referred to as NFV infrastructure (NFVI) of anetwork environment. Overall NFV management and orchestrationfunctionality 1626 may be supported by one or more virtualizedinfrastructure managers (VIMs) 1632, one or more VNF managers 1630 andan orchestrator 1628, wherein VIM 1632 and VNF managers 1630 areinterfaced with virtualization layer and VNF layer, respectively. Aconverged OSS platform 1624 (which may be integrated or co-located witha BSS in some arrangements) is responsible for network-levelfunctionalities such as network management, fault management,configuration management, service management, and subscriber management,etc., as well as interfacing with DC controllers, as noted previously.In one arrangement, various OSS components of the OSS platform 1624 mayinterface with VNF layer 1620 and NFV orchestration 1628 via suitableinterfaces. In addition, OSS/BSS 1624 may be interfaced with aconfiguration module 1634 for facilitating service instantiation andchaining, VNF and infrastructure description input, as well aspolicy-based flow management. Broadly, NFV orchestration 1628 involvesgenerating, maintaining and tearing down of network services or servicefunctions supported by corresponding VNFs, including creating end-to-endservices over multiple VNFs in a network environment, (e.g., servicechaining for various data flows from ingress nodes to egress nodes).Further, NFV orchestrator 1628 is also responsible for global resourcemanagement of NFVI resources, e.g., managing compute, storage andnetworking resources among multiple VIMs in the network.

FIGS. 17A/17B illustrate connectivity between network devices (NDs)within an exemplary DC network, as well as three exemplaryimplementations of the NDs, according to some embodiments of theinvention wherein at least a portion of the DC network and/or associatednodes/components shown in some of the Figures previously discussed maybe implemented in a virtualized environment that may involve, supplementand/or complement the embodiments of FIGS. 1, 2 and 16. In particular,FIG. 17A shows NDs 1700A-H, which may be representative of variousservers, service nodes, switching nodes, gateways, controller nodes, aswell as other network elements of a DC network environment, and thelike, wherein example connectivity is illustrated by way of linesbetween A-B, B-C, C-D, D-E, E-F, F-G, and A-G, as well as between H andeach of A, C, D, and G. As noted elsewhere in the patent application,such NDs may be provided as physical devices, and the connectivitybetween these NDs can be wireless or wired (often referred to as alink). An additional line extending from NDs 1700A, E, and F illustratesthat these NDs may act as ingress and egress nodes for the network (andthus, these NDs are sometimes referred to as edge NDs; while the otherNDs may be called core NDs).

Two of the exemplary ND implementations in FIG. 17A are: (1) aspecial-purpose network device 1702 that uses customapplication—specific integrated—circuits (ASICs) and a proprietaryoperating system (OS); and (2) a general purpose network device 1704that uses common off-the-shelf (COTS) processors and a standard OS.

The special-purpose network device 1702 includes appropriate hardware1710 (e.g., custom or application-specific hardware) comprising computeresource(s) 1712 (which typically include a set of one or moreprocessors), forwarding resource(s) 1714 (which typically include one ormore ASICs and/or network processors), and physical network interfaces(NIs) 1716 (sometimes called physical ports), as well as non-transitorymachine readable storage media 1718 having stored therein suitableapplication-specific software or program instructions 1720 (e.g.,switching, routing, call processing, etc). A physical NI is a piece ofhardware in an ND through which a network connection (e.g., wirelesslythrough a wireless network interface controller (WNIC) or throughplugging in a cable to a physical port connected to a network interfacecontroller (NIC)) is made, such as those shown by the connectivitybetween NDs 1700A-H. During operation, the application software 1020 maybe executed by the hardware 1710 to instantiate a set of one or moreapplication-specific or custom software instance(s) 1722. Each of thecustom software instance(s) 1722, and that part of the hardware 1710that executes that application software instance (be it hardwarededicated to that application software instance and/or time slices ofhardware temporally shared by that application software instance withothers of the application software instance(s) 1722), form a separatevirtual network element 1730A-R. Each of the virtual network element(s)(VNEs) 1730A-R includes a control communication and configuration module1732A-R (sometimes referred to as a local control module or controlcommunication module) and forwarding table(s) 1734A-R with respect tosuitable application/service instances 1733A-R, such that a givenvirtual network element (e.g., 1730A) includes the control communicationand configuration module (e.g., 1732A), a set of one or more forwardingtable(s) (e.g., 1734A), and that portion of the application hardware1710 that executes the virtual network element (e.g., 1730A) forsupporting one or more suitable application/service instances 1733A, andthe like.

In an example implementation, the special-purpose network device 1702 isoften physically and/or logically considered to include: (1) a NDcontrol plane 1724 (sometimes referred to as a control plane) comprisingthe compute resource(s) 1712 that execute the control communication andconfiguration module(s) 1732A-R; and (2) a ND forwarding plane 1726(sometimes referred to as a forwarding plane, a data plane, or a bearerplane) comprising the forwarding resource(s) 1714 that utilize theforwarding or destination table(s) 1734A-R and the physical NIs 1716. Byway of example, where the ND is a DC node, the ND control plane 1724(the compute resource(s) 1712 executing the control communication andconfiguration module(s) 1732A-R) is typically responsible forparticipating in controlling how bearer traffic (e.g., voice/data/video)is to be routed. Likewise, ND forwarding plane 1726 is responsible forreceiving that data on the physical NIs 1716 (e.g., similar to I/Fs 1818and 1820 in FIG. 18, described below) and forwarding that data out theappropriate ones of the physical NIs 1716 based on the forwardinginformation.

FIG. 17B illustrates an exemplary way to implement the special-purposenetwork device 1702 according to some embodiments of the invention,wherein an example special-purpose network device includes one or morecards 1738 (typically hot pluggable) coupled to an interconnectmechanism. While in some embodiments the cards 1738 are of two types(one or more that operate as the ND forwarding plane 1726 (sometimescalled line cards), and one or more that operate to implement the NDcontrol plane 1724 (sometimes called control cards)), alternativeembodiments may combine functionality onto a single card and/or includeadditional card types (e.g., one additional type of card is called aservice card, resource card, or multi-application card). A service cardcan provide specialized processing (e.g., Layer 4 to Layer 7 services(e.g., firewall, Internet Protocol Security (IPsec) (RFC 4301 and 4309),Secure Sockets Layer (SSL)/Transport Layer Security (TLS), IntrusionDetection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) SessionBorder Controller, Mobile Wireless Gateways (Gateway General PacketRadio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC)Gateway), etc.). By way of example, a service card may be used toterminate IPsec tunnels and execute the attendant authentication andencryption algorithms. These cards may be coupled together through oneor more interconnect mechanisms illustrated as backplane 1736 (e.g., afirst full mesh coupling the line cards and a second full mesh couplingall of the cards).

Returning to FIG. 17A, an example embodiment of the general purposenetwork device 1704 includes hardware 1740 comprising a set of one ormore processor(s) 1742 (which are often COTS processors) and networkinterface controller(s) 1744 (NICs; also known as network interfacecards) (which include physical NIs 1746), as well as non-transitorymachine readable storage media 1748 having stored therein software 1750,e.g., general purpose operating system software, similar to theembodiments set forth below in reference to FIG. 19 in one example.During operation, the processor(s) 1742 execute the software 1750 toinstantiate one or more sets of one or more applications 1764A-R withrespect to facilitating DC service instances. While one embodiment doesnot implement virtualization, alternative embodiments may use differentforms of virtualization—represented by a virtualization layer 1754 andsoftware containers 1762A-R. For example, one such alternativeembodiment implements operating system-level virtualization, in whichcase the virtualization layer 1754 represents the kernel of an operatingsystem (or a shim executing on a base operating system) that allows forthe creation of multiple software containers 1762A-R that may each beused to execute one of the sets of applications 1764A-R. In thisembodiment, the multiple software containers 1762A-R (also calledvirtualization engines, virtual private servers, or jails) are each auser space instance (typically a virtual memory space); these user spaceinstances are separate from each other and separate from the kernelspace in which the operating system is run; the set of applicationsrunning in a given user space, unless explicitly allowed, cannot accessthe memory of the other processes. Another such alternative embodimentimplements full virtualization, in which case: (1) the virtualizationlayer 1754 represents a hypervisor (sometimes referred to as a VMM asnoted elsewhere in the present patent application) or a hypervisorexecuting on top of a host operating system; and (2) the softwarecontainers 1762A-R each represent a tightly isolated form of softwarecontainer called a virtual machine that is run by the hypervisor and mayinclude a guest operating system. A virtual machine is a softwareimplementation of a physical machine that runs programs as if they wereexecuting on a physical, non-virtualized machine; and applicationsgenerally do not know they are running on a virtual machine as opposedto running on a “bare metal” host electronic device, though some systemsprovide para-virtualization which allows an operating system orapplication to be aware of the presence of virtualization foroptimization purposes.

The instantiation of the one or more sets of one or more applications1764A-R, as well as the virtualization layer 1054 and softwarecontainers 1762A-R if implemented, are collectively referred to assoftware instance(s) 1752. Each set of applications 1764A-R,corresponding software container 1762A-R if implemented, and that partof the hardware 1740 that executes them (be it hardware dedicated tothat execution and/or time slices of hardware temporally shared bysoftware containers 1762A-R), forms a separate virtual networkelement(s) 1760A-R.

The virtual network element(s) 1760A-R perform similar functionality tothe virtual network element(s) 1730A-R—e.g., similar to the controlcommunication and configuration module(s) 1732A and forwarding table(s)1734A (this virtualization of the hardware 1740 is sometimes referred toas NFV architecture, as mentioned elsewhere in the present patentapplication. Thus, NFV may be used to consolidate many network equipmenttypes onto industry standard high volume server hardware, physicalswitches, and physical storage, which could be located in data centers,NDs, and customer premise equipment (CPE). However, differentembodiments of the invention may implement one or more of the softwarecontainer(s) 1762A-R differently. For example, while embodiments of theinvention may be practiced in an arrangement wherein each softwarecontainer 1762A-R corresponds to one VNE 1760A-R, alternativeembodiments may implement this correspondence at a finer levelgranularity (e.g., line card virtual machines virtualize line cards,control card virtual machine virtualize control cards, etc.); it shouldbe understood that the techniques described herein with reference to acorrespondence of software containers 1762A-R to VNEs also apply toembodiments where such a finer level of granularity is used.

In certain embodiments, the virtualization layer 1754 includes a virtualswitch that provides similar forwarding services as a physical Ethernetswitch. Specifically, this virtual switch forwards traffic betweensoftware containers 1762A-R and the NIC(s) 1744, as well as optionallybetween the software containers 1762A-R. In addition, this virtualswitch may enforce network isolation between the VNEs 1760A-R that bypolicy are not permitted to communicate with each other (e.g., byhonoring virtual local area networks (VLANs)).

The third exemplary ND implementation in FIG. 17A is a hybrid networkdevice 1706, which may include both custom ASICs/proprietary OS and COTSprocessors/standard OS in a single ND or a single card within an ND. Incertain embodiments of such a hybrid network device, a platform VM(i.e., a VM that implements the functionality of the special-purposenetwork device 1702) could provide for para-virtualization to theapplication-specific hardware present in the hybrid network device 1706for effectuating one or more components, blocks, modules, andfunctionalities of an example DC network environment.

Regardless of the above exemplary implementations of an ND, when asingle one of multiple VNEs implemented by an ND is being considered(e.g., only one of the VNEs is part of a given virtual network) or whereonly a single VNE is currently being implemented by an ND, the shortenedterm network element (NE) is sometimes used to refer to that VNE. Alsoin all of the above exemplary implementations, each of the VNEs (e.g.,VNE(s) 1730A-R, VNEs 1760A-R, and those in the hybrid network device1706) receives data on the physical NIs (e.g., 1716, 1746) and forwardsthat data out the appropriate ones of the physical NIs (e.g., 1716,1746).

Turning to FIG. 18, depicted therein is a block diagram of acomputer-implemented apparatus 1800 that may be (re)configured and/or(re)arranged as a platform, server, node or element, e.g., a DCswitching node, a DC gateway node, and/or associated SDN controller, toeffectuate an example DC failure management system and method accordingto an embodiment of the present patent disclosure. It should beappreciated that apparatus 1800 may also be implemented as part of adistributed data center platform in some arrangements, including, e.g.,geographically distributed DC networks. One or more processors 1802 maybe operatively coupled to various modules that may be implemented inpersistent memory for executing suitable program instructions or codeportions with respect to effectuating various aspects of DC failuremanagement, e.g., load balancing, BGP-FS route computing, flow caching,policy configuration, etc. as exemplified by one or more modules asillustrated. As a DC switching node, apparatus 1800 may include a newflow identification database 1810 as well as a match-action ruledatabase 1814. In a DC gateway node implementation, apparatus 1800 mayinclude an FS route database (FIB) 1806 as well as an LPM FIB 1816 andassociated ECMP/load balancing mechanism 1822. As an SDN controller, aflow cache 1826 may be implemented as well as suitable programinstructions for executing various processes set forth in the presentpatent application. A suitable timer mechanism 1804 may be provided aspart of one or more implementations for aging out stale flow entriesfrom different nodes of the DC network. Depending on the implementation,appropriate “upstream” interfaces (I/F) 1818 and/or “downstream” I/Fs1820 may be provided for interfacing with external nodes (e.g., OSS/BSSnodes or customer management nodes), other network elements, and/orother OSS components, etc. Accordingly, depending on the context,interfaces selected from interfaces 1818, 1820 may sometimes be referredto as a first interface, a second interface, and so on. In a furtherarrangement, a Big Data analytics module (not shown in this FIG.) may beoperative in conjunction with a DC network environment where vastquantities of subscriber end station data, customer/tenant service data,service state information, etc. may need to be curated, manipulated, andanalyzed for facilitating DC operations in a multi-domain heterogeneousnetwork environment.

Accordingly, various hardware and software blocks configured foreffectuating an example DC failure management scheme including flowredirection and/or direct forwarding functionality may be embodied inNDs, NEs, NFs, VNE/VNF/VND, virtual appliances, virtual machines, andthe like, as well as electronic devices and machine-readable media,which may be configured as any of the apparatuses described herein. Oneskilled in the art will therefore recognize that various apparatuses andsystems with respect to the foregoing embodiments, as well as theunderlying network infrastructures set forth above may be architected ina virtualized environment according to a suitable NFV architecture inadditional or alternative embodiments of the present patent disclosureas noted above in reference to FIG. 16. Accordingly, for purposes of atleast one embodiment of the present invention, the following detaileddescription may be additionally and/or alternatively provided, mutatismutandis, in an example implementation with respect to the DC componentsand/or the associated network elements of a network environment.

An electronic device stores and transmits (internally and/or with otherelectronic devices over a network) code (which is composed of softwareinstructions and which is sometimes referred to as computer program codeor a computer program) and/or data using machine-readable media (alsocalled computer-readable media), such as machine-readable storage media(e.g., magnetic disks, optical disks, solid state drives, read onlymemory (ROM), flash memory devices, phase change memory) andmachine-readable transmission media (also called a carrier) (e.g.,electrical, optical, radio, acoustical or other form of propagatedsignals—such as carrier waves, infrared signals). Thus, an electronicdevice (e.g., a computer) includes hardware and software, such as a setof one or more processors (e.g., wherein a processor is amicroprocessor, controller, microcontroller, central processing unit,digital signal processor, application specific integrated circuit, fieldprogrammable gate array, other electronic circuitry, a combination ofone or more of the preceding) coupled to one or more machine-readablestorage media to store code for execution on the set of processorsand/or to store data. For instance, an electronic device may includenon-volatile memory containing the code since the non-volatile memorycan persist code/data even when the electronic device is turned off(when power is removed), and while the electronic device is turned onthat part of the code that is to be executed by the processor(s) of thatelectronic device is typically copied from the slower non-volatilememory into volatile memory (e.g., dynamic random access memory (DRAM),static random access memory (SRAM)) of that electronic device. Typicalelectronic devices also include a set or one or more physical networkinterface(s) (NI(s)) to establish network connections (to transmitand/or receive code and/or data using propagating signals) with otherelectronic devices. For example, the set of physical NIs (or the set ofphysical NI(s) in combination with the set of processors executing code)may perform any formatting, coding, or translating to allow theelectronic device to send and receive data whether over a wired and/or awireless connection. In some embodiments, a physical NI may compriseradio circuitry capable of receiving data from other electronic devicesover a wireless connection or channel and/or sending data out to otherdevices via a wireless connection or channel. This radio circuitry mayinclude transmitter(s), receiver(s), and/or transceiver(s) suitable forradiofrequency communication. The radio circuitry may convert digitaldata into a radio signal having the appropriate parameters (e.g.,frequency, timing, channel, bandwidth, etc.). The radio signal may thenbe transmitted via antennas to the appropriate recipient(s).

In some embodiments, the set of physical NI(s) may comprise networkinterface controller(s) (NICs), also known as a network interface card,network adapter, or local area network (LAN) adapter. The NIC(s) mayfacilitate in connecting the electronic device to other electronicdevices allowing them to communicate via wire through plugging in acable to a physical port connected to a NIC. One or more parts of anembodiment of the invention may be implemented using differentcombinations of software, firmware, and/or hardware.

A network device (ND) or network element (NE) as set hereinabove is anelectronic device that communicatively interconnects other electronicdevices on the network (e.g., other network devices, end-user devices,etc.). Some network devices are “multiple services network devices” thatprovide support for multiple networking functions (e.g., routing,bridging, switching, Layer 2 aggregation, session border control,Quality of Service, and/or subscriber management), and/or providesupport for multiple application services (e.g., data, voice, andvideo). The apparatus, and method performed thereby, of the presentinvention may be embodied in one or more ND/NE nodes that may be, insome embodiments, communicatively connected to other electronic deviceson the network (e.g., other network devices, servers, nodes, terminals,etc.). The example NE/ND node may comprise processor resources, memoryresources, and at least one interface. These components may worktogether to provide various DC functionalities as disclosed herein.

Memory may store code (which is composed of software instructions andwhich is sometimes referred to as computer program code or a computerprogram) and/or data using non-transitory machine-readable (e.g.,computer-readable) media, such as machine-readable storage media (e.g.,magnetic disks, optical disks, solid state drives, ROM, flash memorydevices, phase change memory) and machine-readable transmission media(e.g., electrical, optical, radio, acoustical or other form ofpropagated signals—such as carrier waves, infrared signals). Forinstance, memory may comprise non-volatile memory containing code to beexecuted by processor. Where memory is non-volatile, the code and/ordata stored therein can persist even when the network device is turnedoff (when power is removed). In some instances, while network device isturned on that part of the code that is to be executed by theprocessor(s) may be copied from non-volatile memory into volatile memoryof network device.

The at least one interface may be used in the wired and/or wirelesscommunication of signaling and/or data to or from network device. Forexample, interface may perform any formatting, coding, or translating toallow network device to send and receive data whether over a wiredand/or a wireless connection. In some embodiments, interface maycomprise radio circuitry capable of receiving data from other devices inthe network over a wireless connection and/or sending data out to otherdevices via a wireless connection. In some embodiments, interface maycomprise network interface controller(s) (NICs), also known as a networkinterface card, network adapter, local area network (LAN) adapter orphysical network interface. The NIC(s) may facilitate in connecting thenetwork device to other devices allowing them to communicate via wirethrough plugging in a cable to a physical port connected to a NIC. Asexplained above, in particular embodiments, the processor may representpart of interface, and some or all of the functionality described asbeing provided by interface may be provided more specifically byprocessor.

The components of network device are each depicted as separate boxeslocated within a single larger box for reasons of simplicity indescribing certain aspects and features of network device disclosedherein. In practice however, one or more of the components illustratedin the example network device may comprise multiple different physicalelements

One or more embodiments described herein may be implemented in thenetwork device by means of a computer program comprising instructionswhich, when executed on at least one processor, cause the at least oneprocessor to carry out the actions according to any of the invention'sfeatures and embodiments, where appropriate. While the modules areillustrated as being implemented in software stored in memory, otherembodiments implement part or all of each of these modules in hardware.

In one embodiment, the software implements the modules described withregard to the Figures herein. During operation, the software may beexecuted by the hardware to instantiate a set of one or more softwareinstance(s). Each of the software instance(s), and that part of thehardware that executes that software instance (be it hardware dedicatedto that software instance, hardware in which a portion of availablephysical resources (e.g., a processor core) is used, and/or time slicesof hardware temporally shared by that software instance with others ofthe software instance(s)), form a separate virtual network element.Thus, in the case where there are multiple virtual network elements,each operates as one of the network devices.

Some of the described embodiments may also be used where various levelsor degrees of virtualization has been implemented. In certainembodiments, one, some or all of the applications relating to a DCnetwork architecture may be implemented as unikernel(s), which can begenerated by compiling directly with an application only a limited setof libraries (e.g., from a library operating system (LibOS) includingdrivers/libraries of OS services) that provide the particular OSservices needed by the application. As a unikernel can be implemented torun directly on hardware directly on a hypervisor (in which case theunikernel is sometimes described as running within a LibOS virtualmachine), or in a software container, embodiments can be implementedfully with unikernels running directly on a hypervisor represented byvirtualization layer, unikernels running within software containersrepresented by instances, or as a combination of unikernels and theabove-described techniques (e.g., unikernels and virtual machines bothrun directly on a hypervisor, unikernels and sets of applications thatare run in different software containers).

The instantiation of the one or more sets of one or more applications,as well as virtualization if implemented are collectively referred to assoftware instance(s). Each set of applications, correspondingvirtualization construct if implemented, and that part of the hardwarethat executes them (be it hardware dedicated to that execution and/ortime slices of hardware temporally shared by software containers), formsa separate virtual network element(s).

A virtual network is a logical abstraction of a physical network thatprovides network services (e.g., L2 and/or L3 services). A virtualnetwork can be implemented as an overlay network (sometimes referred toas a network virtualization overlay) that provides network services(e.g., Layer 2 (L2, data link layer) and/or Layer 3 (L3, network layer)services) over an underlay network (e.g., an L3 network, such as anInternet Protocol (IP) network that uses tunnels (e.g., generic routingencapsulation (GRE), Layer 2 tunneling protocol (L2TP), IPSec) to createthe overlay network).

A network virtualization edge (NVE) sits at the edge of the underlaynetwork and participates in implementing the network virtualization; thenetwork-facing side of the NVE uses the underlay network to tunnelframes to and from other NVEs; the outward-facing side of the NVE sendsand receives data to and from systems outside the network. A virtualnetwork instance (VNI) is a specific instance of a virtual network on aNVE (e.g., a NE/VNE on an ND, a part of a NE/VNE on a ND where thatNE/VNE is divided into multiple VNEs through emulation); one or moreVNIs can be instantiated on an NVE (e.g., as different VNEs on an ND). Avirtual access point (VAP) is a logical connection point on the NVE forconnecting external systems to a virtual network; a VAP can be physicalor virtual ports identified through logical interface identifiers (e.g.,a VLAN ID).

Examples of network services also include: 1) an Ethernet LAN emulationservice (an Ethernet-based multipoint service similar to an InternetEngineering Task Force (IETF) Multiprotocol Label Switching (MPLS) orEthernet VPN (EVPN) service) in which external systems areinterconnected across the network by a LAN environment over the underlaynetwork (e.g., an NVE provides separate L2 VNIs (virtual switchinginstances) for different such virtual networks, and L3 (e.g., IP/MPLS)tunneling encapsulation across the underlay network); and 2) avirtualized IP forwarding service (similar to IETF IP VPN (e.g., BorderGateway Protocol (BGP)/MPLS IPVPN) from a service definitionperspective) in which external systems are interconnected across thenetwork by an L3 environment over the underlay network (e.g., an NVEprovides separate L3 VNIs (forwarding and routing instances) fordifferent such virtual networks, and L3 (e.g., IP/MPLS) tunnelingencapsulation across the underlay network)). Example network servicesthat may be hosted by a data center may also include quality of servicecapabilities (e.g., traffic classification marking, traffic conditioningand scheduling), security capabilities (e.g., filters to protectcustomer premises from network—originated attacks, to avoid malformedroute announcements), and management capabilities (e.g., full detectionand processing).

Embodiments of a DC environment and/or associated heterogeneousmulti-domain networks may involve distributed routing, centralizedrouting, or a combination thereof. The distributed approach distributesresponsibility for generating the reachability and forwardinginformation across the NEs; in other words, the process of neighbordiscovery and topology discovery is distributed. For example, where thenetwork device is a traditional router, the control communication andconfiguration module(s) of the ND control plane typically include areachability and forwarding information module to implement one or morerouting protocols (e.g., an exterior gateway protocol such as BorderGateway Protocol (BGP), Interior Gateway Protocol(s) (IGP) (e.g., OpenShortest Path First (OSPF), Intermediate System to Intermediate System(IS-IS), Routing Information Protocol (RIP), Label Distribution Protocol(LDP), Resource Reservation Protocol (RSVP) (including RSVP-TrafficEngineering (TE): Extensions to RSVP for LSP Tunnels and GeneralizedMulti-Protocol Label Switching (GMPLS) Signaling RSVP-TE)) thatcommunicate with other NEs to exchange routes, and then selects thoseroutes based on one or more routing metrics. Thus, the NEs perform theirresponsibility for participating in controlling how data (e.g., packets)is to be routed (e.g., the next hop for the data and the outgoingphysical NI for that data) by distributively determining thereachability within the network and calculating their respectiveforwarding information. Routes and adjacencies are stored in one or morerouting structures (e.g., Routing Information Base (RIB), LabelInformation Base (LIB), one or more adjacency structures) on the NDcontrol plane. The ND control plane programs the ND forwarding planewith information (e.g., adjacency and route information) based on therouting structure(s). For example, the ND control plane programs theadjacency and route information into one or more forwarding table(s)(e.g., Forwarding Information Base (FIB), Label Forwarding InformationBase (LFIB), and one or more adjacency structures) on the ND forwardingplane. For Layer 2 forwarding, the ND can store one or more bridgingtables that are used to forward data based on the Layer 2 information inthat data. While the above example uses the special-purpose networkdevice, the same distributed approach can be implemented on a generalpurpose network device and a hybrid network device, e.g., as exemplifiedin the embodiments of FIGS. 17A/17B described above.

Certain NDs (e.g., certain edge NDs) internally represent end userdevices (or sometimes customer premise equipment (CPE) such as aresidential gateway (e.g., a router, modem)) using subscriber circuits.A subscriber circuit uniquely identifies within the ND a subscribersession and typically exists for the lifetime of the session. Thus, a NDtypically allocates a subscriber circuit when the subscriber connects tothat ND, and correspondingly de-allocates that subscriber circuit whenthat subscriber disconnects. Each subscriber session represents adistinguishable flow of packets communicated between the ND and an enduser device (or sometimes CPE such as a residential gateway or modem)using a protocol, such as the point-to-point protocol over anotherprotocol (PPPoX) (e.g., where X is Ethernet or Asynchronous TransferMode (ATM)), Ethernet, 802.1Q Virtual LAN (VLAN), Internet Protocol, orATM). A subscriber session can be initiated using a variety ofmechanisms (e.g., manual provisioning a dynamic host configurationprotocol (DHCP), DHCP/client-less internet protocol service (CLIPS) orMedia Access Control (MAC) address tracking). For example, thepoint-to-point protocol (PPP) is commonly used for digital subscriberline (DSL) services and requires installation of a PPP client thatenables the subscriber to enter a username and a password, which in turnmay be used to select a subscriber record. When DHCP is used (e.g., forcable modem services), a username typically is not provided; but in suchsituations other information (e.g., information that includes the MACaddress of the hardware in the end user device (or CPE)) is provided.The use of DHCP and CLIPS on the ND captures the MAC addresses and usesthese addresses to distinguish subscribers and access their subscriberrecords.

Furthermore, skilled artisans will also appreciate that where an exampleDC platform is implemented in association with cloud-computingenvironment, it may comprise one or more of private clouds, publicclouds, hybrid clouds, community clouds, distributed clouds, multicloudsand interclouds (e.g., “cloud of clouds”), and the like, wherein varioustypes of services and applications may be supported. Example hostedservices/applications may include, but not limited to: cloud storageresources, processor compute resources, network bandwidth resources,load balancing services, virtualized network infrastructure resources,Software as a Service (SaaS) services, Platform as a Service (PaaS)services, Infrastructure as a Service (IaaS) services, streaming mediaservices, email services, social media network services, virtual routingservices, voice telephony/VoIP services, and one or more inline servicessuch as, e.g., Deep Packet Inspection (DPI) services, Virus Scanning(VS) services, Intrusion Detection and Prevention (IDP) services,Firewall (FW) filtering services and Network Address Translation (NAT)services, and the like.

In the above-description of various embodiments of the presentdisclosure, it is to be understood that the terminology used herein isfor the purpose of describing particular embodiments only and is notintended to be limiting of the invention. Unless otherwise defined, allterms (including technical and scientific terms) used herein have thesame meaning as commonly understood by one of ordinary skill in the artto which this invention belongs. It will be further understood thatterms, such as those defined in commonly used dictionaries, should beinterpreted as having a meaning that is consistent with their meaning inthe context of this specification and the relevant art and may not beinterpreted in an idealized or overly formal sense expressly so definedherein.

At least some example embodiments are described herein with reference toblock diagrams and/or flowchart illustrations of computer-implementedmethods, apparatus (systems and/or devices) and/or computer programproducts. It is understood that a block of the block diagrams and/orflowchart illustrations, and combinations of blocks in the blockdiagrams and/or flowchart illustrations, can be implemented by computerprogram instructions that are performed by one or more computercircuits. Such computer program instructions may be provided to aprocessor circuit of a general purpose computer circuit, special purposecomputer circuit, and/or other programmable data processing circuit toproduce a machine, so that the instructions, which execute via theprocessor of the computer and/or other programmable data processingapparatus, transform and control transistors, values stored in memorylocations, and other hardware components within such circuitry toimplement the functions/acts specified in the block diagrams and/orflowchart block or blocks, and thereby create means (functionality)and/or structure for implementing the functions/acts specified in theblock diagrams and/or flowchart block(s). Additionally, the computerprogram instructions may also be stored in a tangible computer-readablemedium that can direct a computer or other programmable data processingapparatus to function in a particular manner, such that the instructionsstored in the computer-readable medium produce an article of manufactureincluding instructions which implement the functions/acts specified inthe block diagrams and/or flowchart block or blocks.

As pointed out previously, tangible, non-transitory computer-readablemedium may include an electronic, magnetic, optical, electromagnetic, orsemiconductor data storage system, apparatus, or device. More specificexamples of the computer-readable medium would include the following: aportable computer diskette, a random access memory (RAM) circuit, a ROMcircuit, an erasable programmable read-only memory (EPROM or Flashmemory) circuit, a portable compact disc read-only memory (CD-ROM), anda portable digital video disc read-only memory (DVD/Blu-ray). Thecomputer program instructions may also be loaded onto or otherwisedownloaded to a computer and/or other programmable data processingapparatus to cause a series of operational steps to be performed on thecomputer and/or other programmable apparatus to produce acomputer-implemented process. Accordingly, embodiments of the presentinvention may be embodied in hardware and/or in software (includingfirmware, resident software, micro-code, etc.) that runs on a processoror controller, which may collectively be referred to as “circuitry,” “amodule” or variants thereof. Further, an example processing unit mayinclude, by way of illustration, a general purpose processor, a specialpurpose processor, a conventional processor, a digital signal processor(DSP), a plurality of microprocessors, one or more microprocessors inassociation with a DSP core, a controller, a microcontroller,Application Specific Integrated Circuits (ASICs), Field ProgrammableGate Array (FPGA) circuits, any other type of integrated circuit (IC),and/or a state machine. As can be appreciated, an example processor unitmay employ distributed processing in certain embodiments.

Further, in at least some additional or alternative implementations, thefunctions/acts described in the blocks may occur out of the order shownin the flowcharts. For example, two blocks shown in succession may infact be executed substantially concurrently or the blocks may sometimesbe executed in the reverse order, depending upon the functionality/actsinvolved. Moreover, the functionality of a given block of the flowchartsand/or block diagrams may be separated into multiple blocks and/or thefunctionality of two or more blocks of the flowcharts and/or blockdiagrams may be at least partially integrated. Furthermore, althoughsome of the diagrams include arrows on communication paths to show aprimary direction of communication, it is to be understood thatcommunication may occur in the opposite direction relative to thedepicted arrows. Finally, other blocks may be added/inserted between theblocks that are illustrated.

It should therefore be clearly understood that the order or sequence ofthe acts, steps, functions, components or blocks illustrated in any ofthe flowcharts depicted in the drawing Figures of the present disclosuremay be modified, altered, replaced, customized or otherwise rearrangedwithin a particular flowchart, including deletion or omission of aparticular act, step, function, component or block. Moreover, the acts,steps, functions, components or blocks illustrated in a particularflowchart may be inter-mixed or otherwise inter-arranged or rearrangedwith the acts, steps, functions, components or blocks illustrated inanother flowchart in order to effectuate additional variations,modifications and configurations with respect to one or more processesfor purposes of practicing the teachings of the present patentdisclosure.

Although various embodiments have been shown and described in detail,the claims are not limited to any particular embodiment or example. Noneof the above Detailed Description should be read as implying that anyparticular component, element, step, act, or function is essential suchthat it must be included in the scope of the claims. Reference to anelement in the singular is not intended to mean “one and only one”unless explicitly so stated, but rather “one or more.” All structuraland functional equivalents to the elements of the above-describedembodiments that are known to those of ordinary skill in the art areexpressly incorporated herein by reference and are intended to beencompassed by the present claims. Accordingly, those skilled in the artwill recognize that the exemplary embodiments described herein can bepracticed with various modifications and alterations within the spiritand scope of the claims appended below.

1. A method operating at a Software Defined Networking (SDN) controllerassociated with a data center having a plurality of switching nodescoupled in a data center network fabric to a plurality of gateway nodes,the method comprising: receiving a first packet of a packet flow from afirst switching node coupled to a service instance hosted by the datacenter, the packet flow having with an n-tuple identifier set andflowing from a gateway node pursuant to a subscriber session via anexternal network; determining that the packet flow requires flowstickiness with respect to the subscriber session; and responsive to thedetermining, programming each switching node other than the firstswitching node to redirect any packets of the packet flow received bythe switching nodes to the first switching node, the packets arriving atthe switching nodes from at least a subset of the plurality of gatewaynodes according to respective Equal Cost Multi Path (ECMP) routingpaths.
 2. The method as recited in claim 1, wherein each switching nodeis programmed by the SDN controller to redirect the packets of thepacket flow by sending one or more OpenFlow specification match-actionrules to the switching nodes that identify the first switching node as adestination node.
 3. The method as recited in claim 1, furthercomprising: generating instructions to the plurality of switching nodesto forward first packets of new packet flows respectively receivedthereat to the SDN controller, the first packets each containingrespective n-tuple identifier sets.
 4. The method as recited in claim 1,further comprising: determining that the subscriber session hasterminated; and responsive to the determination that the subscribersession has terminated, instructing the switching nodes to inactivatethe match-action rules with respect to the packet flow associated withthe subscriber session.
 5. The method as recited in claim 1, wherein then-tuple identifier set comprises at least one of: a source IP address, asource port number, a destination IP address, a destination port numberand a protocol in use.
 6. The method as recited in claim 1, furthercomprising: receiving a subscriber service policy message that thesubscriber session requires flow stickiness.
 7. The method as recited inclaim 1, further comprising: performing a statistical analysis of thepacket flow to determine that the packet flow requires flow stickiness.8. A Software Defined Networking (SDN) controller associated with a datacenter having a plurality of switching nodes coupled in a data centernetwork fabric to a plurality of gateway nodes, the SDN controllercomprising: one or more processors; and a persistent memory modulecoupled to the one or more processors and having program instructionsthereon, which when executed by the one or more processors perform thefollowing: generate instructions to the plurality of switching nodes toforward first packets of new packet flows respectively received thereatto the SDN controller, the first packets each containing respectiven-tuple identifier sets; responsive to the instructions, receive a firstpacket of a packet flow from a first switching node coupled to a serviceinstance hosted by the data center, the packet flow having with ann-tuple identifier set and flowing from a gateway node pursuant to asubscriber session via an external network; determine that the packetflow requires flow stickiness with respect to the subscriber session;and responsive to the determining, program each switching node otherthan the first switching node to redirect any packets of the packet flowreceived by the switching nodes to the first switching node, the packetsarriving at the switching nodes from at least a subset of the pluralityof gateway nodes according to respective Equal Cost Multi Path (ECMP)routing paths.
 9. The SDN controller as recited in claim 8, wherein theprogram instructions further comprise instructions configured to programthe switching nodes by sending one or more OpenFlow specificationmatch-action rules that identify the first switching node as adestination node.
 10. The SDN controller as recited in claim 8, whereinthe program instructions further comprise instructions configured to:determine that the subscriber session has terminated; and responsive tothe determination that the subscriber session has terminated, instructthe switching nodes to inactivate the match-action rules with respect tothe packet flow associated with the subscriber session.
 11. The SDNcontroller as recited in claim 8, wherein the n-tuple identifier setcomprises at least one of: a source IP address, a source port number, adestination IP address, a destination port number and a protocol in use.12. The SDN controller as recited in claim 8, wherein the programinstructions further comprise instructions configured to determine thatthe packet flow requires flow stickiness responsive to receiving asubscriber service policy message regarding the subscriber session. 13.The SDN controller as recited in claim 8, wherein the programinstructions further comprise instructions configured to perform astatistical analysis of the packet flow to determine that the packetflow requires flow stickiness.
 14. A method operating at a switchingnode of a data center having a plurality of switching nodes coupled in adata center network fabric to a plurality of gateway nodes andcontrolled by a Software Defined Networking (SDN) controller, the methodcomprising: forwarding first packets of new packet flows received at theswitching node to the SDN controller, the first packets each containingrespective n-tuple identifier sets of corresponding packet flows andflowing from one or more gateway nodes pursuant to respective subscribersessions via an external network; populating a flow identificationdatabase for identifying the new packet flows as they arrive at theswitching node; receiving packets at the switching node for a particularpacket flow pursuant to a subscriber session via an external network,the particular packet flow having a particular n-tuple identifier set;determining that the particular packet flow is not a new packet flow bycomparing the particular n-tuple identifier set of the particular packetflow against the flow identification database; responsive to thedetermining, applying a programming rule provided by the SDN controllerwith respect to the particular packet flow, the programming ruleoperating to identify a destination switching node other than theswitching node receiving the particular packet flow; and redirecting thepackets of the particular packet flow to the destination switching nodeidentified according to the programming rule.
 15. The method as recitedin claim 14, wherein the packets of the particular packet flow arereceived from a gateway node according to an Equal Cost Multi Path(ECMP) path.
 16. The method as recited in claim 14, wherein theprogramming rule comprises at least one match-action rule provided bythe SDN controller according to OpenFlow specification.
 17. The methodas recited in claim 14, wherein each n-tuple identifier set comprises atleast one of: a source IP address, a source port number, a destinationIP address, a destination port number and a protocol in use.
 18. Themethod as recited in claim 14, further comprising: deleting an entry inthe flow identification database responsive to at least one of a commandfrom the SDN controller, reaching a timeout value, and determining thatthe subscriber session has terminated.
 19. A switching node associatedwith a data center having a plurality of gateway nodes and a pluralityof switching nodes coupled in a data center network fabric andcontrolled by a Software Defined Networking (SDN) controller, theswitching node comprising: one or more processors; and a persistentmemory module coupled to the one or more processors and having programinstructions thereon, which perform the following when executed by theone or more processors: forward first packets of new packet flowsreceived at the switching node to the SDN controller, the first packetseach containing respective n-tuple identifier sets of correspondingpacket flows and flowing from one or more gateway nodes pursuant torespective subscriber sessions via an external network; populate a flowidentification database for identifying the new packet flows as theyarrive at the switching node; receive packets at the switching node fora particular packet flow pursuant to a subscriber session via anexternal network, the particular packet flow having a particular n-tupleidentifier set; determine that the particular packet flow is not a newpacket flow by comparing the particular n-tuple identifier set of theparticular packet flow against the flow identification database;responsive to the determining, apply a programming rule provided by theSDN controller with respect to the particular packet flow, theprogramming rule operating to identify a destination switching nodeother than the switching node receiving the particular packet flow; andredirect the packets of the particular packet flow to the destinationswitching node identified according to the programming rule.
 20. Theswitching node as recited in claim 19, wherein the packets of theparticular packet flow are received from a gateway node according to anEqual Cost Multi Path (ECMP) path.
 21. The switching node as recited inclaim 19, wherein the programming rule comprises at least onematch-action rule provided by the SDN controller according to OpenFlowspecification.
 22. The switching node as recited in claim 19, whereineach n-tuple identifier set comprises at least one of: a source IPaddress, a source port number, a destination IP address, a destinationport number and a protocol in use.
 23. The switching node as recited inclaim 19, further comprising program instructions for deleting an entryin the flow identification database responsive to at least one of acommand from the SDN controller, reaching a timeout value, anddetermining that the subscriber session has terminated.